Python decode()

5/5 - (3 votes)

This tutorial explains the Python decode() method with arguments and examples. Before we dive into the Python decode() method, let’s first build some background knowledge about encoding and decoding so you can better understand its purpose. πŸ‘‡

Encoding and Decoding – What Does It Mean?

Programs must handle various characters in several languages. Application developers often internationalize programs to display messages and error outputs in various languages, be it English, Russian, Japanese, French, or Hebrew.Β 

Python’s string type uses the Unicode Standard to represent characters, which lets Python programs work with all possible characters.

Unicode aims to list every character used by human languages and gives each character its unique code. The Unicode Consortium specifications regularly update its specifications for new languages and symbols.

A character is the smallest component of the text. For example, ’a, β€˜B’, β€˜c’, β€˜Γˆβ€™ and β€˜Γβ€™ are different characters. Characters vary depending on language or context. For example, the character for β€œRoman Numeral One” is β€˜β… β€™, separate from the uppercase letter β€˜I’. Though they look the same, these are two different characters that have different meanings.

The Unicode standard describes how code points represent characters. A code point value is an integer from 0 to 0x10FFFF. [1]

What are Encodings?

A sequence of code points forms a Unicode String represented in memory as a set of code units. These code units are mapped to 8-bit bytes. Character Encoding is the set of rules to translate a Unicode string to a byte sequence.

UTF-8 is the most commonly used encoding, and Python defaults to it. UTF stands for β€œUnicode Transformation Format”, and the β€˜8’ refers to 8-bit values used in the encoding. [2]

Python decode()

Encoders and decoders convert text between different representations, and specifically, the Python bytes decode() function converts bytes to string objects.

The decode() method converts/decodes from one encoding scheme for the argument string to the desired encoding scheme. It is the opposite of the Python encode() method.

decode() accepts the encoding of the encoded string, decodes it, and returns the original string.

Here’s the syntax of the method:

decode(encoding, error)
str.decode([encoding[, errors]])

# Example:
str.decode(encoding='UTF-8',errors='strict'

The decode() arguments:

ArgumentDescription
encoding (optional)Specifies the encoding to decode. Standard Encodings has a list of all encodings.
errors (optional)Decides how to handle the errors:

'strict' [default], meaning encoding errors raise a UnicodeError.Β 

Other possible values are:

'ignore' – Ignore the character and continue with the next

'replace' – Replace with a suitable replacement character

'xmlcharrefreplace' – Inserts an XML character referenceΒ 

'backslashreplace' – Inserts a backslash escape sequence (\uNNNN) instead of un-encodable Unicode characters

'namereplace'
– Inserts a \N{...} escape sequence and any other name registered via codecs.register_error()

Example 1

text = "Python Decode converts text string from one encoding scheme to the desired one."
encoded_text = text.encode('ubtf8', 'strict')
print("Encoded String: ", encoded_text)
print("Decoded String: ", encoded_text.decode('utf8', 'strict'))
  • Encoded String:Β  b'Python Decode converts text from one encoding scheme to desired encoding scheme.'
  • Decoded String:Β  Python Decode converts text from one encoding scheme to desired encoding scheme.

Example 2

>>> b'\x81abc'.decode("utf-8", "strict")
Traceback (most recent call last):
  File "<pyshell#55>", line 1, in <module>
    b'\x81abc'.decode("utf-8", "strict")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 0: invalid start byte
>>> b'\x80abc'.decode("utf-8", "backslashreplace")
'\\x80abc'
>>> b'\x80abc'.decode("utf-8", "ignore")
'abc'

References