Pandas DataFrame Conversion - Be on the Right Side of Change

The Pandas DataFrame is a data structure that organizes data into a two-dimensional format. If you are familiar with Excel or Databases, the setup is similar. Each DataFrame contains a schema that defines a Column (Field) Name and a Data Type.

Below is the Database Schema for our Hockey Teams example.

This article delves into each method for DataFrame Conversions.

Preparation

Before any data manipulation can occur, a new library will require installation.

The Pandas library enables access to/from a DataFrame.

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

If the installation was successful, a message displays in the terminal indicating the same.

Feel free to view the PyCharm installation guide for the required library.

How to Install Pandas on PyCharm?

Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd

Create a DataFrame

For this article, we have three Hockey Teams. Each Team lists its Wins, Losses, and Ties for the Season.

teams = {'Team-A':   [20, 2,  8], 
         'Team-B':   [18, 6,  6],
         'Team-C':   [14, 3,  13]}

df = pd.DataFrame(teams)
print(df)

Line [1] creates a dictionary of lists and saves them to teams.
Line [2] creates a DataFrame from teams and saves it to df.
Line [3] outputs the DataFrame to the terminal.

Output

	Team-A	Team-B	Team-C
0	20	18	14
1	2	6	3
2	8	6	13

💡 Note: Copy this DataFrame to the top of each script, directly below the import pandas statement.

DataFrame astype()

The astype() method offers the ability to modify column Data Types. This change can be applied to all columns or as many or as few as needed.

The syntax for this method is as follows:

DataFrame.astype(dtype, copy=True, errors='raise')

Parameter	Description
`dtype`	The Data Type to be applied.
`copy`	If `True`, a copy of the DataFrame (including changes) is created. `True` by default.
`errors`	If `errors=raise` an exception error displays if an issue occurs. If set to `ignore`, no exception error display. Default raise.

The current Data Types of the teams DataFrame is as follows:

Team-A	int64
Team-B	int64
Team-C	int64

Using the same DataFrame as above, this code changes the Data Types.

df = pd.DataFrame(teams)
df = df.astype({'Team-A': 'float64', 'Team-B': 'int32', 'Team-C': 'string'},  errors='raise') 
print(df.dtypes)

Line [1] uses the DataFrame created earlier.
Line [2] converts each column to a different Data Type based on the code.
Line [3] outputs the Data Types to the terminal.

Team-A	float64
Team-B	Int32
Team-C	string

DataFrame Convert Data Types

The convert_types() method converts the Data Types and returns these changes in a new DataFrame. In this new DataFrame, each column changes to the best possible Data Type based on the data.

The syntax for this method is as follows:

DataFrame.convert_dtypes(infer_objects=True, convert_string=True, 
                         convert_integer=True, convert_boolean=True, 
                         convert_floating=True)

Parameter	Description
`infer_objects`	Determines if the `dtypes` (Data Types) should convert to the best type. `True` by default.
`convert_string`	Determines if object `dtypes` (Data Types) should be converted to `StringDtype()`. `True` by default.
`convert_integer`	Determines if object `dtypes` should be converted to `BooleanDtypes()`. `True` by default.

When the following code runs, the Data Types do not change from the original Data Type of int64. These Data Types were determined to be the best Data Type based on the data at hand.

df = pd.DataFrame(teams)
df = df.convert_dtypes()
print(df.dtypes)

Line [1] uses the DataFrame created earlier.
Line [2] converts the Data Types to the best possible Data Types.
Line [3] outputs the converted DataFrame to the terminal.

Output

Team-A	int64
Team-B	int64
Team-C	int64

DataFrame Infer Objects

The infer_objects() method attempts to determine the best Data Type based on the data at hand.

For this example, the original DataFrame is modified as follows:

teams = {'Team-A':    [20.0, 2,  8], 
         'Team-B':   [18, 6.2,  6],
         'Team-C':   [14, 3,  13]}

df = pd.DataFrame(teams)
df = df.iloc[1:]
print(df.infer_objects().dtypes)

Line [1] creates an updated DataFrame and saves it to teams.
Line [2] creates a DataFrame and saves it to df.
Line [3] uses iloc to determine the best Data Types.
Line [4] outputs the appropriate Data Types based on the data at hand to the terminal.

💡 Note: The first and third columns contain floating-point numbers, and the second column contains an integer. This method acts as expected.

Output

Team-A	float64
Team-B	float64
Team-C	Int64

Change Data Type – Alternative

Let’s say we decided to change all the Data Types to float64. An easy way to accomplish this is by running the following code. A great alternative!

teams = {'Team-A':   [20.0, 2,  8], 
         'Team-B':   [18, 6.2,  6],
         'Team-C':   [14, 3,  13]}

teams = {k:[float(i) for i in v] for k, v in teams.items()}
print(teams)

Output

{'Team-A': [20.0, 2.0, 8.0], 
 'Team-B': [18.0, 6.2, 6.0], 
 'Team-C': [14.0, 3.0, 13.0]}

In case you had some troubles understanding this code snippet, feel free to check out our full guide on dictionary comprehension:

Dictionary Comprehension Tutorial

DataFrame copy()

The copy() method makes a copy of a DataFrame.

The syntax for this method is as follows:

DataFrame.copy(deep=True/False)

Parameter	Description
`deep=True`	When a copy of a DataFrame using `deep=True` (shallow) is created, this copy contains its own set of data and indices. Any modifications to the new DataFrame do not affect the original DataFrame.
`deep=False`	When a copy of a DataFrame is created using `deep=False`, this copy contains a reference to the original DataFrame data and indices. Any modifications to the new DataFrame automatically update the original DataFrame.

teams = {'Team-A':   [20.0, 2,  8], 
         'Team-B':   [18, 6.2,  6],
         'Team-C':   [14, 3,  13]}

df = pd.DataFrame(teams)
shallow_copy = df.copy(deep=True)
shallow_copy['Team-A'] = [4, 5, 6]
print(shallow_copy)
print(df)

Line [1] assigns a dictionary of lists to teams.
Line [2] creates a DataFrame from teams and assigns it to df.
Line [3] makes a deep copy of the DataFrame and assigns it to shallow_copy.
Line [4] makes a change to the shallow_copy variable.
Line [5] outputs this change to the terminal.
Line [6] outputs the DataFrame to the terminal.

Output

SHALLOW	Team-A	Team-B	Team-C
0	4	18.0	14
1	5	6.2	3
2	6	6.0	13

ORIGINAL	Team-A	Team-B	Team-C
0	20.0	18.0	14
1	2.0	6.2	3
2	8.0	6.0	13

DataFrame Bool

The df.bool() method references a Series/DataFrame that contains one element (value). This element/value must be True/False or 0/1. If this is not the case, a ValueError occurs.

The syntax for this method is as follows:

DataFrame.copy(deep=True/False)

Here’s the code example:

print(pd.Series([True]).bool())
print(pd.DataFrame({'col': [False]}).bool())

Output

True
False