Convert Bytes to String [Python]

[toc]

Overview

Problem Statement: How to convert bytes data to string data in Python?

Example: The following example illustrates how the decode() method converts a byte string to string. (We will dive into the details of this solution soon!)

val_bytes = b"Please keep smiling \xF0\x9F\x98\x83!"
print("Byte String: ", val_bytes)
print("Type of val_bytes: ", type(val_bytes))
val_str = val_bytes.decode('UTF-8')
print("=========================================")
print("String: ", val_str)
print("Type of val_str: ", type(val_str))

Output:

Byte String:  b'Please keep smiling \xf0\x9f\x98\x83!'
Type of val_bytes:  <class 'bytes'>
=========================================
String:  Please keep smiling πŸ˜ƒ!
Type of val_str:  <class 'str'>

Note: Difference between Byte and String Objects in Python

  • Strings are normal sequences of characters, while byte objects can be considered as a sequence of bytes.
  • Strings represent a human-readable value, whereas bytes are understood by the machine,i.e., they are machine-readable objects.
  • Byte objects can be stored on the disk directly, whereas string objects have to be encoded before they can be stored in the machine.

Now that we have an idea about the problem at hand let’s dive into the different ways to solve it.

Solution 1: Using decode()

The most straightforward approach to convert the byte object to string is to use the decode() method.

The process of converting human-readable data into a specified format for secured data transmission is known as encoding. Decoding is the opposite of encoding,i.e., it is the process that converts the encoded information to normal text (human-readable form).

In Python, 

  • encode() is an inbuilt method used for encoding. In case no encoding is specified, UTF-8 is used as default. 
  • decode() is an inbuilt method used for decoding

Example:

val_bytes = b"Please keep smiling \xF0\x9F\x98\x83!"
print("Byte String: ", val_bytes)
print("Type of val_bytes: ", type(val_bytes))
val_str = val_bytes.decode('UTF-8')
print("=========================================")
print("String: ", val_str)
print("Type of val_str: ", type(val_str))

Output:

Byte String:  b'Please keep smiling \xf0\x9f\x98\x83!'
Type of val_bytes:  <class 'bytes'>
=========================================
String:  Please keep smiling πŸ˜ƒ!
Type of val_str:  <class 'str'>

Explanation: In the above snippet, the variable val_bytes is a byte string. The value \xf0\x9f\x98\x83! is the equivalent byte string for the emoji πŸ˜ƒ. In order to convert it to a human-readable format, i.e., to see the emoji instead of the byte value, we used the decode method with the encoding as ‘utf-8‘ upon val_bytes and stored it as a string in the variable val_str.

Solution 2: Using str()

Another way to solve our problem is to use Python’s built-in str(x) method that converts the object x, which can be of a different data type to a string.

Python str() Function Built-in -- Explanation

Example:

text = b'Learn to earn $100/hr as a Freelancer!'
print("text is a ", type(text))
# converting to string
res = str(text, 'UTF-8')
print('\n' + res)
print("res is a ", type(text))

Output:

text is a  <class 'bytes'>

Learn to earn $100/hr as a Freelancer!
res is a  <class 'bytes'>

Explanation: In the above solution, we converted the byte to a string value using the str() method by passing a couple of parameters. The first parameter is the byte string stored in the variable text and the second parameter is utf-8, which informs Python that it has to convert the byte string to a simple string that has been encoded using utf-8.

Solution 3: Using map+join

Example: Let’s say that you have a set of ASCII values (bytes) stored within a list and you want to convert them to their respective string equivalents. Let’s see how we can do this in the following snippet.

var = [68, 51, 90]
s = ''.join(map(chr, var)) 
for n, i in enumerate(var):
    print(i, ":", s[n])

Output:

68 : D
51 : 3
90 : Z

The map() method helps us to convert the byte equivalents (ASCII values) to the respective string/characters.

Solution 4: Using codecs.decode

The decode() method of the codecs module in Python also helps us to convert a byte string to a normal string. Simply import the codecs module and use the codecs.decode(str) method to perform the conversion.

Example:

import codecs

val_bytes = b"Please keep smiling \xF0\x9F\x98\x83!"
print("Byte String: ", val_bytes)
print("Type of val_bytes: ", type(val_bytes))
val_str = codecs.decode(val_bytes)
print("=========================================")
print("String: ", val_str)
print("Type of val_str: ", type(val_str))

Output:

Byte String:  b'Please keep smiling \xf0\x9f\x98\x83!'
Type of val_bytes:  <class 'bytes'>
=========================================
String:  Please keep smiling πŸ˜ƒ!
Type of val_str:  <class 'str'>

Encoding Alert!

Please note that there are numerous encoding formats available which might make it difficult for you to come up with the proper decoding standard. Let’s have a look at the following example:

s = b'\xf8\xe7'
print(s.decode('UTF-16'))
print(s.decode('Latin1'))
print(s.decode('UTF-8'))

Output:

Want to deal with the above problem? Please have a look at this tutorial: Python Unicode Encode Error.

How to Translate “bytes” Objects into Literal Strings in Pandas Dataframe, Python3.x?

Let’s say that we have a pandas DataFrame where the columns are strings that are expressed as bytes. So, how will you access these elements which are byte objects?

Solution:

import pandas as pd

d = {'column': [b'\xF0\x9F\x98\x84', b'\xF0\x9F\x98\x8D', b'\xF0\x9F\x98\x9C', b'\xF0\x9F\x99\x8C', b'\xF0\x9F\x98\x83']}
df = pd.DataFrame(data=d)
output = df['column'].str.decode("utf-8")
print(output)

Output:

0    πŸ˜„
1    😍
2    😜
3    πŸ™Œ
4    πŸ˜ƒ
Name: column, dtype: object

Explanation: In the above solution, we simply vectorised str.decode to decode the available byte strings in our dataframe to normal strings.

Conclusion

We learned numerous ways of converting a byte object to a string object in Python in this article. You may opt for any approach depending upon the scenario and your requirement. With that, we come to the end of our discussion, and I hope it helped you. Please subscribe and stay tuned for more interesting articles in the future.

Happy coding!


Finxter Computer Science Academy

  • One of the most sought-after skills on Fiverr and Upwork is web scraping. Make no mistake: extracting data programmatically from websites is a critical life skill in today’s world that’s shaped by the web and remote work.
  • So, do you want to master the art of web scraping using Python’s BeautifulSoup?
  • If the answer is yes – this course will take you from beginner to expert in Web Scraping.