The Pandas DataFrame is a data structure that organizes data into a two-dimensional format. If you are familiar with Excel or Databases, the setup is similar. Each DataFrame contains a schema that defines a Column (Field) Name and a Data Type.
Below is the Database Schema for our Hockey Teams example.
This article delves into each method for DataFrame Conversions.
Preparation
Before any data manipulation can occur, a new library will require installation.
- The Pandas library enables access to/from a DataFrame.
To install this library, navigate to an IDE terminal. At the command prompt ($
), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($
). Your terminal prompt may be different.
$ pip install pandas
Hit the <Enter>
key on the keyboard to start the installation process.
If the installation was successful, a message displays in the terminal indicating the same.
Feel free to view the PyCharm installation guide for the required library.
Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.
import pandas as pd
Create a DataFrame
For this article, we have three Hockey Teams. Each Team lists its Wins, Losses, and Ties for the Season.
teams = {'Team-A': [20, 2, 8], 'Team-B': [18, 6, 6], 'Team-C': [14, 3, 13]} df = pd.DataFrame(teams) print(df)
- Line [1] creates a dictionary of lists and saves them to teams.
- Line [2] creates a DataFrame from teams and saves it to
df
. - Line [3] outputs the DataFrame to the terminal.
Output
Team-A | Team-B | Team-C | |
0 | 20 | 18 | 14 |
1 | 2 | 6 | 3 |
2 | 8 | 6 | 13 |
π‘ Note: Copy this DataFrame to the top of each script, directly below the import pandas
statement.
DataFrame astype()
The astype()
method offers the ability to modify column Data Types. This change can be applied to all columns or as many or as few as needed.
The syntax for this method is as follows:
DataFrame.astype(dtype, copy=True, errors='raise')
Parameter | Description |
---|---|
dtype | The Data Type to be applied. |
copy | If True , a copy of the DataFrame (including changes) is created. True by default. |
errors | If errors=raise an exception error displays if an issue occurs. If set to ignore , no exception error display. Default raise. |
The current Data Types of the teams
DataFrame is as follows:
Team-A | int64 |
Team-B | int64 |
Team-C | int64 |
Using the same DataFrame as above, this code changes the Data Types.
df = pd.DataFrame(teams) df = df.astype({'Team-A': 'float64', 'Team-B': 'int32', 'Team-C': 'string'}, errors='raise') print(df.dtypes)
- Line [1] uses the DataFrame created earlier.
- Line [2] converts each column to a different Data Type based on the code.
- Line [3] outputs the Data Types to the terminal.
Team-A | float64 |
Team-B | Int32 |
Team-C | string |
DataFrame Convert Data Types
The convert_types()
method converts the Data Types and returns these changes in a new DataFrame. In this new DataFrame, each column changes to the best possible Data Type based on the data.
The syntax for this method is as follows:
DataFrame.convert_dtypes(infer_objects=True, convert_string=True, convert_integer=True, convert_boolean=True, convert_floating=True)
Parameter | Description |
---|---|
infer_objects | Determines if the dtypes (Data Types) should convert to the best type. True by default. |
convert_string | Determines if object dtypes (Data Types) should be converted to StringDtype() . True by default. |
convert_integer | Determines if object dtypes should be converted to BooleanDtypes() . True by default. |
When the following code runs, the Data Types do not change from the original Data Type of int64
. These Data Types were determined to be the best Data Type based on the data at hand.
df = pd.DataFrame(teams) df = df.convert_dtypes() print(df.dtypes)
- Line [1] uses the DataFrame created earlier.
- Line [2] converts the Data Types to the best possible Data Types.
- Line [3] outputs the converted DataFrame to the terminal.
Output
Team-A | int64 |
Team-B | int64 |
Team-C | int64 |
DataFrame Infer Objects
The infer_objects()
method attempts to determine the best Data Type based on the data at hand.
For this example, the original DataFrame is modified as follows:
teams = {'Team-A': [20.0, 2, 8], 'Team-B': [18, 6.2, 6], 'Team-C': [14, 3, 13]} df = pd.DataFrame(teams) df = df.iloc[1:] print(df.infer_objects().dtypes)
- Line [1] creates an updated DataFrame and saves it to
teams
. - Line [2] creates a DataFrame and saves it to
df
. - Line [3] uses
iloc
to determine the best Data Types. - Line [4] outputs the appropriate Data Types based on the data at hand to the terminal.
π‘ Note: The first and third columns contain floating-point numbers, and the second column contains an integer. This method acts as expected.
Output
Team-A | float64 |
Team-B | float64 |
Team-C | Int64 |
Change Data Type β Alternative
Let’s say we decided to change all the Data Types to float64. An easy way to accomplish this is by running the following code. A great alternative!
teams = {'Team-A': [20.0, 2, 8], 'Team-B': [18, 6.2, 6], 'Team-C': [14, 3, 13]} teams = {k:[float(i) for i in v] for k, v in teams.items()} print(teams)
Output
{'Team-A': [20.0, 2.0, 8.0], 'Team-B': [18.0, 6.2, 6.0], 'Team-C': [14.0, 3.0, 13.0]}
In case you had some troubles understanding this code snippet, feel free to check out our full guide on dictionary comprehension:
DataFrame copy()
The copy()
method makes a copy of a DataFrame.
The syntax for this method is as follows:
DataFrame.copy(deep=True/False)
Parameter | Description |
---|---|
deep=True | When a copy of a DataFrame using deep=True (shallow) is created, this copy contains its own set of data and indices. Any modifications to the new DataFrame do not affect the original DataFrame. |
deep=False | When a copy of a DataFrame is created using deep=False , this copy contains a reference to the original DataFrame data and indices. Any modifications to the new DataFrame automatically update the original DataFrame. |
teams = {'Team-A': [20.0, 2, 8], 'Team-B': [18, 6.2, 6], 'Team-C': [14, 3, 13]} df = pd.DataFrame(teams) shallow_copy = df.copy(deep=True) shallow_copy['Team-A'] = [4, 5, 6] print(shallow_copy) print(df)
- Line [1] assigns a dictionary of lists to
teams
. - Line [2] creates a DataFrame from
teams
and assigns it todf
. - Line [3] makes a deep copy of the DataFrame and assigns it to
shallow_copy
. - Line [4] makes a change to the
shallow_copy
variable. - Line [5] outputs this change to the terminal.
- Line [6] outputs the DataFrame to the terminal.
Output
SHALLOW | Team-A | Team-B | Team-C |
0 | 4 | 18.0 | 14 |
1 | 5 | 6.2 | 3 |
2 | 6 | 6.0 | 13 |
ORIGINAL | Team-A | Team-B | Team-C |
0 | 20.0 | 18.0 | 14 |
1 | 2.0 | 6.2 | 3 |
2 | 8.0 | 6.0 | 13 |
DataFrame Bool
The df.bool()
method references a Series/DataFrame that contains one element (value). This element/value must be True
/False
or 0/1. If this is not the case, a ValueError
occurs.
The syntax for this method is as follows:
DataFrame.copy(deep=True/False)
Here’s the code example:
print(pd.Series([True]).bool()) print(pd.DataFrame({'col': [False]}).bool())
Output
True False