Serialize DataFrame to Clipboard, HTML, SQL, CSV, Excel

This article focuses on the serialization and conversion methods of a Python DataFrame:

to_clipboard(),
to_html(),
to_sql(),
to_csv(), and
to_excel()

Let’s get started!

Preparation

Before any data manipulation can occur, two (2) new libraries will require installation.

The Pandas library enables access to/from a DataFrame.
The Openpyxl library enables conversion to/from Excel.

To install these libraries, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install openpyxl

Hit the <Enter> key on the keyboard to start the installation process.

If the installations were successful, a message displays in the terminal indicating the same.

Feel free to view the PyCharm installation guide for the required libraries.

How to install Pandas on PyCharm

Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd
import openpyxl

DataFrame.to_clipboard()

The to_clipboard method copies an object to the operating system’s clipboard. The output can be pasted (WIndows: CTRL+V) to other applications.

The syntax for this method is as follows:

DataFrame.to_clipboard(excel=True, sep=None, **kwargs)

Parameter	Description
`excel`	If this parameter is `True`, the output is saved in a CSV format for pasting to Excel.
`sep`	This is the field separator between the fields. The default value is a comma.
`**kwargs`	These parameters will pass to a DataFrame’s `to_csv()` method.

This example copies the inventory for Rivers Clothing to the system clipboard. A Notepad application is opened and the contents of the clipboard contents pasted.

df_inv = pd.DataFrame({'Tops':     [36, 23, 19],
                       'Tanks':    [20, 10, 20],
                       'Pants':    [61, 33, 67],
                       'Sweats':   [88, 38, 13]})

df_inv.to_clipboard(sep=',', index=False)

Line [1] creates a DataFrame from a dictionary of lists. The output saves to the df_inv.
Line [2] does the following:
- copies the contents to the clipboard
- separates the fields with the comma (,) character
- omits the leading comma for each row (index=False)

To see this code in action, perform the following steps:

Navigate to and open an instance of Notepad (or another text editor).
Press CTRL+V (Windows) to paste the contents of the system clipboard to the application.

Output

DataFrame.to_html()

The to_html() method converts a valid DataFrame object to an HTML Table format.

DataFrame.to_html(buf=None, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, max_rows=None, max_cols=None, show_dimensions=False, decimal='.', bold_rows=True, classes=None, escape=True, notebook=False, border=None, table_id=None, render_links=False, encoding=None)

Parameter	Description
`buf`	This parameter is the buffer to write to. If empty, a string returns.
`columns`	The column subset to convert. If empty, all columns will convert.
`col_space`	The minimum width (CSS length) of each column.
`header`	If `True`, the column heads will display.
`index`	If `True`, the row labels will display.
`na_rep`	The string depiction of any `NaN` values.
`formatters`	The formatter is the option to apply to each column (name/position).
`float_format`	The formatter option to apply to float values.
`sparsify`	For `MultiIndex` DataFrames. Set to `False` for a hierarchical index.
`index_names`	If `True`, the index names display.
`justify`	A string value that depicts the justification type. Options are: left, right, center, justify, justify-all, start, end, inherit, match-parent, initial and, unset.
`max_rows`	The maximum number of rows to display.
`max_cols`	The maximum number of columns to display
`show_dimensions`	Display the dimensions of the DataFrame (rows/columns).
`decimal`	This parameter is the character for a comma (`,`) in Europe.
`bold_rows`	If `True`, the row labels will be bold.
`classes`	Classes (formatting) to apply to the HTML table.
`escape`	Convert the characters: `<>&` to HTML-safe characters.
`notebook`	This parameter determines if the output is for an ipython notebook.
`border`	If `True`, a border surrounds the cells of each column/row in the HTML table.
`table_id`	If set, a CSS id tag (`#`) is included in the opening `<table>` tag.
`render_links`	If `True`, this parameter converts URLs strings to links.
`encoding`	This parameter sets the encoding for the HTML.

This example reads in the countries.csv file and converts it to an HTML table. To follow along, click here to save this CSV file and move it to the current working directory.

💡 Note: To run this code, copy the code to the clipboard. Click here to navigate to a Jupyter testing environment. If this code is run in a standard IDE, the formatting will not display.

from IPython.display import HTML

df = pd.DataFrame({'Tops':     [36, 23, 19],
                   'Tanks':    [20, 10, 20],
                   'Pants':    [61, 33, 67],
                   'Sweats':   [88, 38, 13]})

HTML(df.to_html(classes='table table-hover'))

Line [1] imports the appropriate library to render the HTML.
Line [2] creates a DataFrame from a dictionary of lists. The output saves to df.
Line [3] exports as HTML with formatting.

Output

DataFrame.to_sql()

The to_sql() method writes rows (records) from a DataFrame to a SQL database.

The syntax for this method is as follows.

DataFrame.to_sql(name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)

Parameter	Description
`name`	This parameter is the name of the SQL table.
`con`	This parameter is the database connections details.
`schema`	Specify the database schema.
`if_exists`	This parameter determines what to do if the specified table exists. The options are: fail, replace, or append.
`index`	This parameter signifies the column to be set as the index.
`index_label`	This parameter is the column name of the index.
`chunksize`	Specify the number of rows to batch add in each round.
`dtype`	Specify the data type for each column. Dictionary or scalar.
`method`	This parameter controls the insertion clause. Options are: – `None`: use the SQL INSERT statement. – `Multi`: pass multiple values to a single INSERT statement – Callable: requires a sample implementation statement

This example reads in the countries.csv file. These records are then inserted into a Database using sqlite3. To follow along, click here to save this CSV file and move it to the current working directory.

conn = sqlite3.connect('countries_database')
c = conn.cursor()

c.execute('CREATE TABLE IF NOT EXISTS countries(Country text, Capital text, Population integer, Area integer)')
conn.commit()

df = pd.read_csv('countries.csv')
df.to_sql('countries', conn, if_exists='replace', index=False)
 
c.execute('SELECT * FROM countries')

for row in c.fetchall():
    print (row)

Line [1] creates a connection to the countries_database.
Line [2] executes the code on Line [1] and creates the Database.
Line [3] contains SQL code to create a database table containing the specified fields.
Line [4] executes the code on Line [3] and creates the table.
Line [5] reads in the countries.csv file to the DataFrame df.
Line [6] converts the DataFrame to SQL and inserts all records into the Database.
Line [7] executes the SQL SELECT command to retrieve all records from the Database.
Line [8-9] instantiates a for loop and displays one (1) record/line from the Database.

Output

(‘Germany’, ‘Berlin’, 83783942, 357021)

(‘France’, ‘Paris’, 67081000, 551695)

(‘Spain’, ‘Madrid’, 47431256, 498511)

(‘Italy’, ‘Rome’, 60317116, 301338)

(‘Poland’, ‘Warsaw’, 38383000, 312685)

(‘Russia’, ‘Moscow’, 146748590, 17098246)

(‘USA’, ‘Washington’, 328239523, 9833520)

(‘China’, ‘Beijing’, 1400050000, 9596961)

(‘India’, ‘Dheli’, 1352642280, 3287263)

💡 Note: For additional details on SQL commands, click here.

DataFrame.to_csv()

The to_csv() method converts an object to a comma-separated values (CSV) file.

The syntax for this method is as follows:

DataFrame.to_csv(path_or_buf=None, sep=',', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression='infer', quoting=None, quotechar='"', line_terminator=None, chunksize=None, date_format=None, doublequote=True, escapechar=None, decimal='.', errors='strict', storage_options=None)

Parameter	Description
`path_or_buf`	This parameter is the file path to write the CSV to.
`sep`	This parameter is the field separator. The default is a comma (`,`).
`na_rep`	What string to replace any missing data.
`float_format`	This is the format for floating numbers.
`columns`	This parameter contains the field(s) to write to the CSV file.
`header`	If `True`, the column names are written.
`index`	If `True`, the index names are written.
`index_label`	This parameter is the column name for the index label(s).
`mode`	This specified the file mode. By default, the mode is `w`,
`encoding`	This is a string representing the encoding for the CSV file.
`compression`	If the infer option is stipulated, the available compressions are: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’, or ‘.zst’ extensions.
`quoting`	If a float_format is applied, float values convert to strings
`quotechar`	This is the character used for quote fields.
`line_terminator`	The newline character to use in the CSV file. Example: ’\n’ for Linux, ‘\r\n’ for Windows.
`chunksize`	This parameter denotes the number of rows to write at a time.
`date_format`	This is the format for any dates
`doublequote`	This parameter is the quoting inside a field.
`escapechar`	This is the character used to escape the sep and quotechar parameters.
`decimal`	This is the character used as a decimal separator.
`errors`	This parameter determines how encoding and decoding errors are handled. Click here for details.
`storage_options`	This parameter contains extra options (dictionary format), such as host, port, username, etc.

This example saves a DataFrame of five (5) host city details for the Summer and Winter Olympic Games to a CSV file.

df = pd.DataFrame(({2010: ['Vancouver', 'Canada', 'North America'],
                    2012: ['London', 'United Kingdon', 'Europe'],
                    2014: ['Sochi', 'Russia', 'Europe',],
                    2016: ['Rio de Janeiro', 'Brazil', 'South America'],
                    2018: ['Pyeongchang', 'South Korea', 'Asia']}))
df.to_csv('games.csv', index=False)

Line [1] creates a DataFrame from a Dictionary of Lists. The output saves to df.
Line [2] creates a CSV file and saves it to the current working directory.

Output

💡 Note: Saving to a CSV file removes all formatting.

DataFrame.to_excel()

The to_excel() method converts an object to an Excel file (XLSX).

The syntax for this method is as follows:

DataFrame.to_excel(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True, encoding=None, inf_rep='inf', verbose=True, freeze_panes=None, storage_options=None)

Parameter	Description
`excel_writer`	This parameter is the file path or the Excel writer.
`sheet_name`	The name of the Excel sheet name containing the data.
`na_rep`	If missing data is encountered, what should this be replaced with.
`float_format`	This is the format for floating numbers.
`columns`	This parameter contains the field(s) to write to the XLSX file.
`header`	If `True`, the column names are written.
`index`	If `True`, the index names are written.
`index_label`	This parameter is the column name for the index label(s).
`startrow`	This parameter references the upper-left cell row to place the data.
`startcol`	This parameter references the upper-left cell column to place the data.
`engine`	This parameter is the engine to use, such as `openpyxl` or `xlsxwriter`.
`merge_cells`	This parameter writes `MultiIndex` rows as merged cells.
`encoding`	This is a string representing the encoding for the XLSX file.
`inf_rep`	This is the depiction of infinity.
`verbose`	If True, this will display additional details in the log file.
`freeze_panes`	This parameter depicts the bottom-most row and right-most column to freeze.
`storage_options`	This parameter contains extra options (Dictionary format), such as host, port, username, etc.

In this example, a DataFrame is created from a list of five (5) Classical Composers. This file converts and saves to an Excel (XLSX) file.

df = pd.DataFrame({'Mozart':   [1756, 1791],
                   'Bach':     [1685, 1750],
                   'Chopin':   [1810, 1849],
                   'Haydn':    [1732, 1809],
                   'Brahms':   [1833, 1897]},
                   index=['Born', 'Passed'])

df.to_excel('composers.xlsx', sheet_name='Classical')

Line [1] creates a DataFrame from a Dictionary of Lists. This DataFrame saves to df.
Line [2] creates an XLSX file and saves it to the current working directory.

Output

💡 Note: Click here to view Finxter in-depth articles on Working with Excel.

Further Learning Resources

This is Part 21 of the DataFrame method series.

Part 1 focuses on the DataFrame methods abs(), all(), any(), clip(), corr(), and corrwith().
Part 2 focuses on the DataFrame methods count(), cov(), cummax(), cummin(), cumprod(), cumsum().
Part 3 focuses on the DataFrame methods describe(), diff(), eval(), kurtosis().
Part 4 focuses on the DataFrame methods mad(), min(), max(), mean(), median(), and mode().
Part 5 focuses on the DataFrame methods pct_change(), quantile(), rank(), round(), prod(), and product().
Part 6 focuses on the DataFrame methods add_prefix(), add_suffix(), and align().
Part 7 focuses on the DataFrame methods at_time(), between_time(), drop(), drop_duplicates() and duplicated().
Part 8 focuses on the DataFrame methods equals(), filter(), first(), last(), head(), and tail()
Part 9 focuses on the DataFrame methods equals(), filter(), first(), last(), head(), and tail()
Part 10 focuses on the DataFrame methods reset_index(), sample(), set_axis(), set_index(), take(), and truncate()
Part 11 focuses on the DataFrame methods backfill(), bfill(), fillna(), dropna(), and interpolate()
Part 12 focuses on the DataFrame methods isna(), isnull(), notna(), notnull(), pad() and replace()
Part 13 focuses on the DataFrame methods drop_level(), pivot(), pivot_table(), reorder_levels(), sort_values() and sort_index()
Part 14 focuses on the DataFrame methods nlargest(), nsmallest(), swap_level(), stack(), unstack() and swap_axes()
Part 15 focuses on the DataFrame methods melt(), explode(), squeeze(), to_xarray(), t() and transpose()
Part 16 focuses on the DataFrame methods append(), assign(), compare(), join(), merge() and update()
Part 17 focuses on the DataFrame methods asfreq(), asof(), shift(), slice_shift(), tshift(), first_valid_index(), and last_valid_index()
Part 18 focuses on the DataFrame methods resample(), to_period(), to_timestamp(), tz_localize(), and tz_convert()
Part 19 focuses on the visualization aspect of DataFrames and Series via plotting, such as plot(), and plot.area().
Part 20 focuses on continuing the visualization aspect of DataFrames and Series via plotting such as hexbin, hist, pie, and scatter plots.
Part 21 focuses on the serialization and conversion methods from_dict(), to_dict(), from_records(), to_records(), to_json(), and to_pickles().
Part 22 focuses on the serialization and conversion methods to_clipboard(), to_html(), to_sql(), to_csv(), and to_excel().
Part 23 focuses on the serialization and conversion methods to_markdown(), to_stata(), to_hdf(), to_latex(), to_xml().
Part 24 focuses on the serialization and conversion methods to_parquet(), to_feather(), to_string(), Styler.
Part 25 focuses on the serialization and conversion methods to_bgq() and to_coo().