Problem Formulation and Solution Overview
A Series is a 1D (one-dimensional) array. This Series can be of various data types, such as an integer, a string, a float or even an object!
A good practice is to ensure, before performing any calculations in a Pandas Series, to validate the Data Types. Doing this will prevent type errors in your code.
This article uses the fictitious finxters_sample.csv
file. The contents of which are shown below.
FID | Rank | Solved | Avg | Yearly | Taxes |
30022145 | Authority | 1915 | 89.08 | $143.76 | 7 |
30022192 | Beginner | 1001 | 76.15 | $143.76 | 8 |
30022331 | Basic Knowledge | 15 | 2.68 | $119.40 | 5 |
30022345 | Authority | 1110 | 10.46 | $131.76 | 3 |
30022157 | Authority | 1875 | 85.98 | $143.76 | 6 |
Preparation
import pandas as pd
After importing the Pandas library, this library is referenced by calling the shortcode (pd
).
Method 1: Use Pandas dtypes
This method uses
. This function verifies and returns an object representing the Data Types of a given DataFrame Series/Column.dtypes
users = pd.read_csv('finxters_sample.csv') print(users.dtypes)
Above, reads in the finxters_sample.csv
file and saves it to the DataFrame users
.
Then,
is appended to dtypes
users
and output to the terminal.
This function determines the Data Type of each DataFrame Series (Column) and returns an object containing the same.
FID | int64 |
Rank | object |
Solved | int64 |
Avg | float64 |
Yearly | object |
Taxes | int64 |
dtype: object |
What happens when we attempt to add two (2) different data types together (an object and an int64)?
Let’s add the first entry of Yearly
row 0 ($143.76
) with the Taxes
of 7
in the same row.
print(users['Yearly'][0] + users['Taxes'][0])
When this code runs, an error similar to below occurs. Pandas does not allow mathematical operations on some data types, such as an object (a string in this case) and an Integer (int64).
File "C:\method-1.py", line 4, in <module> |
This error occurs because the Yearly
value of
is an object (string) and needs to be converted to the proper Data Type of float64 before performing the calculation.$143.76
The code below removes the dollar sign ($) using slicing and converts this value to a float64 (146.76
).
result = (float(users['Yearly'][0][1:]) + users['Taxes'][0]) print(result)
Once converted, the addition operation is performed, saved to results
, and output to the terminal.
150.76 |
π‘Note: The users['Taxes'][0]
value does not need to be converted to a float. This is because the int64 Data Type automatically converts to a float64 when the addition is performed.
Method 2: Use astype()
This method uses astype()
. This method doesn’t determine the Data Type but can convert the current Data Type to a different one.
df = pd.read_csv('finxters_sample.csv', usecols=['Yearly']) df['Yearly'] = df['Yearly'].str[1:].astype('float64') print(df['Yearly'])
For this example, the Yearly
column from the finxters_sample.csv
is read in and saved to a DataFrame df
. Remember, this field is an object (a string in this case) as it contains a leading dollar sign ($).
Yearly | |
0 | $143.76 |
1 | $143.76 |
2 | $119.40 |
3 | $131.76 |
4 | $143.76 |
To convert this to a float64
, the dollar sign ($) is stripped for the Yearly
column (str[1:]
). Then astype()
is called and passed one (1) argument ('float64'
). The results save back to df['Yearly']
.
As you can see, when the output is sent to the terminal, the Data Type for the above column is now a float64
.
Yearly | |
0 | 143.76 |
1 | 143.76 |
2 | 119.40 |
3 | 131.76 |
4 | 143.76 |
This can be confirmed by outputting the following to the terminal.
print(df['Yearly'].dtype)
float64 |
Method 3: Use apply()
The apply()
method allows the coder to apply a function along a DataFrame axis (default 0, the row axis).
avg_list = pd.Series(['Avgs', 89.08, 76, 2.68, 10.46, 85.98]).apply(type) print(avg_list)
For this example, the Averages for five (5) Finxter users are saved as a DataFrame Series.
The apply()
function is appended to avg_list
and passed one (1) argument, type
. The output is sent to the terminal.
This function (apply(type)
) determines the Data Type for each element and returns an object indicating the same.
0 | <class ‘str’> |
1 | <class ‘float’> |
2 | <class ‘int’> |
3 | <class ‘float’> |
4 | <class ‘float’> |
dtype: object |
Method 4: Use apply() and unique()
This method uses apply()
and unique()
to retrieve a List of unique Data Types in the Series.
misc_lst = pd.Series(['50', 2, 3, '57', 23.87]) print(misc_lst.apply(type).unique())
For this example, a DataFrame Series is created containing random data and saved to misc_lst
.
The apply()
function is appended to misc_lst
and passed one (1) argument, type
. Next the function unique()
is appended to apply(type)
.
This output results in unique Data Type objects, which display in a List format.
[ <class 'str'> <class 'int'> <class 'float'>] |
Method 5: Use apply() and value_counts()
This method uses apply()
and value_counts()
to determine the Data Types of each Series/Column element and totals the number of times each occurs.
misc_lst = pd.Series(['50', 2, 3, '57', 23.87]) result = misc_lst.apply(type).value_counts() print(result)
For this example, a DataFrame Series is created containing random data and saved to misc_lst
.
The apply()
function is appended to misc_lst
and passed one (1) argument, type
. Next the function value_counts()
is appended to apply(type)
. This determines how many times each different Data Type occurs in the List.
This output results in the following object, which displays each Data Type’s total count.
<class ‘str’> | 2 |
<class ‘int’> | 2 |
<class ‘float’> | 1 |
dtype: int64 |
Method 6: Use List Comprehension
This method uses List Comprehension to quickly and efficiently return a List of Data Types of a Series/Columns.
misc_lst = pd.Series(['50', 2, 3, '57', 23.87]) result = [x for x in misc_lst.apply(type).unique()] print(result)
For this example, a DataFrame Series is created containing random data and saved to misc_lst
.
Then, List Comprehension is used to traverse through each element, using apply()
and unique()
to determine the unique Data Type and save it as a List of objects.
The results are output to the terminal.
[ <class 'str'>, <class 'int'>, <class 'float'>] |
β¨A Finxter Favorite!
Summary
Programmer Humor – Blockchain
