In the previous article, we got acquainted with Unicode and methods of processing input Unicode strings, different ways of processing and converting them into a readable form – string objects in Python.
Let’s look at ways of converting to other types of output data and applying different encodings to them.
Problem Formulation
Suppose we need to send data in the form of characters represented as integers int.
Function ord.
The built-in function ord() takes a Unicode string character as an argument and returns an int, the value of the Unicode code point.
A = '\u0048' >>>print(ord(A)) # 72
If the argument consists of two or more characters, a TypeError will appear:
B = '\u0048u0065u006Cu006Cu006F' >>>print(ord(B)) # TypeError: ord() expected a character, but string of length 5 found
To avoid this, let’s use a list generator in combination with the map function, the first argument of which is an int function, and the second is an iterable composite object – in our case, a list:
>>>print(list(map(int, [ord(i) for i in B]))) # [72, 101, 108, 108, 111]
Checking the data type:
>>>B_list = list(map(int, [ord(i) for i in B])) >>>print(type(B_list{0])) # <class 'int'>
You can use a for loop and immediately check the data type of each character:
>>>for i in B: print(ord(i), type(ord(i)), end=' ') # 72 <class 'int'> # 101 <class 'int'> # 108 <class 'int'> # 108 <class 'int'> # 111 <class 'int'>
Python Convert Unicode to Float
Similar to the task described above, it is sometimes necessary to convert a Unicode string to float numbers.
Function ord.
Using ord(), but already wrapping it with a float function, we will get the desired result, provided that the length of the Unicode string does not exceed one character:
A = '\u0048' >>>print(float(ord(A))) # 72.0
If the argument consists of two or more characters, a TypeError will be thrown, but we already know how to avoid it – we will use the list method:
>>>print(list(map(float, [ord(i) for i in B]))) # [72.0, 101.0, 108.0, 108.0, 111.0]
Or we can use a for loop, and the data type of each character will be float, since we explicitly indicated to convert to this type:
>>>for i in B: print(float(ord(i)), sep=' ') # 72.0 101.0 108.0 108.0 111.0