This article focuses on the serialization and conversion methods of a Python DataFrame:
to_clipboard()
,to_html()
,to_sql()
,to_csv()
, andto_excel()
Let’s get started!
Preparation
Before any data manipulation can occur, two (2) new libraries will require installation.
- The Pandas library enables access to/from a DataFrame.
- The Openpyxl library enables conversion to/from Excel.
To install these libraries, navigate to an IDE terminal. At the command prompt ($
), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($
). Your terminal prompt may be different.
$ pip install pandas
Hit the <Enter> key on the keyboard to start the installation process.
$ pip install openpyxl
Hit the <Enter> key on the keyboard to start the installation process.
If the installations were successful, a message displays in the terminal indicating the same.
Feel free to view the PyCharm installation guide for the required libraries.
Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.
import pandas as pd import openpyxl
DataFrame.to_clipboard()
The to_clipboard
method copies an object to the operating system’s clipboard. The output can be pasted (WIndows: CTRL+V
) to other applications.
The syntax for this method is as follows:
DataFrame.to_clipboard(excel=True, sep=None, **kwargs)
Parameter | Description |
---|---|
excel | If this parameter is True , the output is saved in a CSV format for pasting to Excel. |
sep | This is the field separator between the fields. The default value is a comma. |
**kwargs | These parameters will pass to a DataFrame’s to_csv() method. |
This example copies the inventory for Rivers Clothing to the system clipboard. A Notepad application is opened and the contents of the clipboard contents pasted.
df_inv = pd.DataFrame({'Tops': [36, 23, 19], 'Tanks': [20, 10, 20], 'Pants': [61, 33, 67], 'Sweats': [88, 38, 13]}) df_inv.to_clipboard(sep=',', index=False)
- Line [1] creates a DataFrame from a dictionary of lists. The output saves to the
df_inv
. - Line [2] does the following:
- copies the contents to the clipboard
- separates the fields with the comma (
,
) character - omits the leading comma for each row (
index=False
)
To see this code in action, perform the following steps:
- Navigate to and open an instance of Notepad (or another text editor).
- Press
CTRL+V
(Windows) to paste the contents of the system clipboard to the application.
Output
DataFrame.to_html()
The to_html()
method converts a valid DataFrame object to an HTML Table format.
DataFrame.to_html(buf=None, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, max_rows=None, max_cols=None, show_dimensions=False, decimal='.', bold_rows=True, classes=None, escape=True, notebook=False, border=None, table_id=None, render_links=False, encoding=None)
Parameter | Description |
---|---|
buf | This parameter is the buffer to write to. If empty, a string returns. |
columns | The column subset to convert. If empty, all columns will convert. |
col_space | The minimum width (CSS length) of each column. |
header | If True , the column heads will display. |
index | If True , the row labels will display. |
na_rep | The string depiction of any NaN values. |
formatters | The formatter is the option to apply to each column (name/position). |
float_format | The formatter option to apply to float values. |
sparsify | For MultiIndex DataFrames. Set to False for a hierarchical index. |
index_names | If True , the index names display. |
justify | A string value that depicts the justification type. Options are: left, right, center, justify, justify-all, start, end, inherit, match-parent, initial and, unset. |
max_rows | The maximum number of rows to display. |
max_cols | The maximum number of columns to display |
show_dimensions | Display the dimensions of the DataFrame (rows/columns). |
decimal | This parameter is the character for a comma (, ) in Europe. |
bold_rows | If True , the row labels will be bold. |
classes | Classes (formatting) to apply to the HTML table. |
escape | Convert the characters: <>& to HTML-safe characters. |
notebook | This parameter determines if the output is for an ipython notebook. |
border | If True , a border surrounds the cells of each column/row in the HTML table. |
table_id | If set, a CSS id tag (# ) is included in the opening <table> tag. |
render_links | If True , this parameter converts URLs strings to links. |
encoding | This parameter sets the encoding for the HTML. |
This example reads in the countries.csv file and converts it to an HTML table. To follow along, click here to save this CSV file and move it to the current working directory.
π‘ Note: To run this code, copy the code to the clipboard. Click here to navigate to a Jupyter testing environment. If this code is run in a standard IDE, the formatting will not display.
from IPython.display import HTML df = pd.DataFrame({'Tops': [36, 23, 19], 'Tanks': [20, 10, 20], 'Pants': [61, 33, 67], 'Sweats': [88, 38, 13]}) HTML(df.to_html(classes='table table-hover'))
- Line [1] imports the appropriate library to render the HTML.
- Line [2] creates a DataFrame from a dictionary of lists. The output saves to
df
. - Line [3] exports as HTML with formatting.
Output
DataFrame.to_sql()
The to_sql()
method writes rows (records) from a DataFrame to a SQL database.
The syntax for this method is as follows.
DataFrame.to_sql(name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)
Parameter | Description |
---|---|
name | This parameter is the name of the SQL table. |
con | This parameter is the database connections details. |
schema | Specify the database schema. |
if_exists | This parameter determines what to do if the specified table exists. The options are: fail, replace, or append. |
index | This parameter signifies the column to be set as the index. |
index_label | This parameter is the column name of the index. |
chunksize | Specify the number of rows to batch add in each round. |
dtype | Specify the data type for each column. Dictionary or scalar. |
method | This parameter controls the insertion clause. Options are: – None : use the SQL INSERT statement.– Multi : pass multiple values to a single INSERT statement– Callable: requires a sample implementation statement |
This example reads in the countries.csv
file. These records are then inserted into a Database using sqlite3. To follow along, click here to save this CSV file and move it to the current working directory.
conn = sqlite3.connect('countries_database') c = conn.cursor() c.execute('CREATE TABLE IF NOT EXISTS countries(Country text, Capital text, Population integer, Area integer)') conn.commit() df = pd.read_csv('countries.csv') df.to_sql('countries', conn, if_exists='replace', index=False) c.execute('SELECT * FROM countries') for row in c.fetchall(): print (row)
- Line [1] creates a connection to the countries_database.
- Line [2] executes the code on Line [1] and creates the Database.
- Line [3] contains SQL code to create a database table containing the specified fields.
- Line [4] executes the code on Line [3] and creates the table.
- Line [5] reads in the
countries.csv
file to the DataFramedf
. - Line [6] converts the DataFrame to SQL and inserts all records into the Database.
- Line [7] executes the
SQL SELECT
command to retrieve all records from the Database. - Line [8-9] instantiates a for loop and displays one (1) record/line from the Database.
Output
(‘Germany’, ‘Berlin’, 83783942, 357021) |
(‘France’, ‘Paris’, 67081000, 551695) |
(‘Spain’, ‘Madrid’, 47431256, 498511) |
(‘Italy’, ‘Rome’, 60317116, 301338) |
(‘Poland’, ‘Warsaw’, 38383000, 312685) |
(‘Russia’, ‘Moscow’, 146748590, 17098246) |
(‘USA’, ‘Washington’, 328239523, 9833520) |
(‘China’, ‘Beijing’, 1400050000, 9596961) |
(‘India’, ‘Dheli’, 1352642280, 3287263) |
π‘ Note: For additional details on SQL commands, click here.
DataFrame.to_csv()
The to_csv()
method converts an object to a comma-separated values (CSV) file.
The syntax for this method is as follows:
DataFrame.to_csv(path_or_buf=None, sep=',', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression='infer', quoting=None, quotechar='"', line_terminator=None, chunksize=None, date_format=None, doublequote=True, escapechar=None, decimal='.', errors='strict', storage_options=None)
Parameter | Description |
---|---|
path_or_buf | This parameter is the file path to write the CSV to. |
sep | This parameter is the field separator. The default is a comma (, ). |
na_rep | What string to replace any missing data. |
float_format | This is the format for floating numbers. |
columns | This parameter contains the field(s) to write to the CSV file. |
header | If True , the column names are written. |
index | If True , the index names are written. |
index_label | This parameter is the column name for the index label(s). |
mode | This specified the file mode. By default, the mode is w , |
encoding | This is a string representing the encoding for the CSV file. |
compression | If the infer option is stipulated, the available compressions are: β.gzβ, β.bz2β, β.zipβ, β.xzβ, or β.zstβ extensions. |
quoting | If a float_format is applied, float values convert to strings |
quotechar | This is the character used for quote fields. |
line_terminator | The newline character to use in the CSV file. Example: β\nβ for Linux, β\r\nβ for Windows. |
chunksize | This parameter denotes the number of rows to write at a time. |
date_format | This is the format for any dates |
doublequote | This parameter is the quoting inside a field. |
escapechar | This is the character used to escape the sep and quotechar parameters. |
decimal | This is the character used as a decimal separator. |
errors | This parameter determines how encoding and decoding errors are handled. Click here for details. |
storage_options | This parameter contains extra options (dictionary format), such as host, port, username, etc. |
This example saves a DataFrame of five (5) host city details for the Summer and Winter Olympic Games to a CSV file.
df = pd.DataFrame(({2010: ['Vancouver', 'Canada', 'North America'], 2012: ['London', 'United Kingdon', 'Europe'], 2014: ['Sochi', 'Russia', 'Europe',], 2016: ['Rio de Janeiro', 'Brazil', 'South America'], 2018: ['Pyeongchang', 'South Korea', 'Asia']})) df.to_csv('games.csv', index=False)
- Line [1] creates a DataFrame from a Dictionary of Lists. The output saves to
df
. - Line [2] creates a CSV file and saves it to the current working directory.
Output
π‘ Note: Saving to a CSV file removes all formatting.
DataFrame.to_excel()
The to_excel()
method converts an object to an Excel file (XLSX).
The syntax for this method is as follows:
DataFrame.to_excel(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True, encoding=None, inf_rep='inf', verbose=True, freeze_panes=None, storage_options=None)
Parameter | Description |
---|---|
excel_writer | This parameter is the file path or the Excel writer. |
sheet_name | The name of the Excel sheet name containing the data. |
na_rep | If missing data is encountered, what should this be replaced with. |
float_format | This is the format for floating numbers. |
columns | This parameter contains the field(s) to write to the XLSX file. |
header | If True , the column names are written. |
index | If True , the index names are written. |
index_label | This parameter is the column name for the index label(s). |
startrow | This parameter references the upper-left cell row to place the data. |
startcol | This parameter references the upper-left cell column to place the data. |
engine | This parameter is the engine to use, such as openpyxl or xlsxwriter . |
merge_cells | This parameter writes MultiIndex rows as merged cells. |
encoding | This is a string representing the encoding for the XLSX file. |
inf_rep | This is the depiction of infinity. |
verbose | If True, this will display additional details in the log file. |
freeze_panes | This parameter depicts the bottom-most row and right-most column to freeze. |
storage_options | This parameter contains extra options (Dictionary format), such as host, port, username, etc. |
In this example, a DataFrame is created from a list of five (5) Classical Composers. This file converts and saves to an Excel (XLSX) file.
df = pd.DataFrame({'Mozart': [1756, 1791], 'Bach': [1685, 1750], 'Chopin': [1810, 1849], 'Haydn': [1732, 1809], 'Brahms': [1833, 1897]}, index=['Born', 'Passed']) df.to_excel('composers.xlsx', sheet_name='Classical')
- Line [1] creates a DataFrame from a Dictionary of Lists. This DataFrame saves to
df
. - Line [2] creates an XLSX file and saves it to the current working directory.
Output
π‘Β Note: Click here to view Finxter in-depth articles on Working with Excel.
Further Learning Resources
This is Part 21 of the DataFrame method series.