Serialize DataFrame to Clipboard, HTML, SQL, CSV, Excel

5/5 - (1 vote)

This article focuses on the serialization and conversion methods of a Python DataFrame:

  • to_clipboard(),
  • to_html(),
  • to_sql(),
  • to_csv(), and
  • to_excel()

Let’s get started!


Preparation

Before any data manipulation can occur, two (2) new libraries will require installation.

  • The Pandas library enables access to/from a DataFrame.
  • The Openpyxl library enables conversion to/from Excel.

To install these libraries, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install openpyxl

Hit the <Enter> key on the keyboard to start the installation process.

If the installations were successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guide for the required libraries.


Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd
import openpyxl

DataFrame.to_clipboard()

The to_clipboard method copies an object to the operating system’s clipboard. The output can be pasted (WIndows: CTRL+V) to other applications.

The syntax for this method is as follows:

DataFrame.to_clipboard(excel=True, sep=None, **kwargs)
ParameterDescription
excelIf this parameter is True, the output is saved in a CSV format for pasting to Excel.
sepThis is the field separator between the fields. The default value is a comma.
**kwargsThese parameters will pass to a DataFrame’s to_csv() method.

This example copies the inventory for Rivers Clothing to the system clipboard. A Notepad application is opened and the contents of the clipboard contents pasted.

df_inv = pd.DataFrame({'Tops':     [36, 23, 19],
                       'Tanks':    [20, 10, 20],
                       'Pants':    [61, 33, 67],
                       'Sweats':   [88, 38, 13]})

df_inv.to_clipboard(sep=',', index=False)  
  • Line [1] creates a DataFrame from a dictionary of lists. The output saves to the df_inv.
  • Line [2] does the following:
    • copies the contents to the clipboard
    • separates the fields with the comma (,) character
    • omits the leading comma for each row (index=False)

To see this code in action, perform the following steps:

  • Navigate to and open an instance of Notepad (or another text editor).
  • Press CTRL+V (Windows) to paste the contents of the system clipboard to the application.

Output


DataFrame.to_html()

The to_html() method converts a valid DataFrame object to an HTML Table format.

DataFrame.to_html(buf=None, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, max_rows=None, max_cols=None, show_dimensions=False, decimal='.', bold_rows=True, classes=None, escape=True, notebook=False, border=None, table_id=None, render_links=False, encoding=None)
ParameterDescription
bufThis parameter is the buffer to write to. If empty, a string returns.
columnsThe column subset to convert. If empty, all columns will convert.
col_spaceThe minimum width (CSS length) of each column.
headerIf True, the column heads will display.
indexIf True, the row labels will display.
na_repThe string depiction of any NaN values.
formattersThe formatter is the option to apply to each column (name/position).
float_formatThe formatter option to apply to float values.
sparsifyFor MultiIndex DataFrames. Set to False for a hierarchical index.
index_namesIf True, the index names display.
justifyA string value that depicts the justification type. Options are:
left, right, center, justify, justify-all, start, end, inherit,
match-parent, initial and, unset.
max_rowsThe maximum number of rows to display.
max_colsThe maximum number of columns to display
show_dimensionsDisplay the dimensions of the DataFrame (rows/columns).
decimalThis parameter is the character for a comma (,) in Europe.
bold_rowsIf True, the row labels will be bold.
classesClasses (formatting) to apply to the HTML table.
escapeConvert the characters: <>& to HTML-safe characters.
notebookThis parameter determines if the output is for an ipython notebook.
borderIf True, a border surrounds the cells of each column/row in the HTML table.
table_idIf set, a CSS id tag (#) is included in the opening <table> tag.
render_linksIf True, this parameter converts URLs strings to links.
encodingThis parameter sets the encoding for the HTML.

This example reads in the countries.csv file and converts it to an HTML table. To follow along, click here to save this CSV file and move it to the current working directory.

πŸ’‘ Note: To run this code, copy the code to the clipboard. Click here to navigate to a Jupyter testing environment. If this code is run in a standard IDE, the formatting will not display.

from IPython.display import HTML

df = pd.DataFrame({'Tops':     [36, 23, 19],
                   'Tanks':    [20, 10, 20],
                   'Pants':    [61, 33, 67],
                   'Sweats':   [88, 38, 13]})

HTML(df.to_html(classes='table table-hover'))
  • Line [1] imports the appropriate library to render the HTML.
  • Line [2] creates a DataFrame from a dictionary of lists. The output saves to df.
  • Line [3] exports as HTML with formatting.

Output


DataFrame.to_sql()

The to_sql() method writes rows (records) from a DataFrame to a SQL database.

The syntax for this method is as follows.

DataFrame.to_sql(name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)
ParameterDescription
nameThis parameter is the name of the SQL table.
conThis parameter is the database connections details.
schemaSpecify the database schema.
if_existsThis parameter determines what to do if the specified table exists.
The options are: fail, replace, or append.
indexThis parameter signifies the column to be set as the index.
index_labelThis parameter is the column name of the index.
chunksizeSpecify the number of rows to batch add in each round.
dtypeSpecify the data type for each column. Dictionary or scalar.
methodThis parameter controls the insertion clause. Options are:
None: use the SQL INSERT statement.
Multi: pass multiple values to a single INSERT statement
Callable: requires a sample implementation statement

This example reads in the countries.csv file. These records are then inserted into a Database using sqlite3. To follow along, click here to save this CSV file and move it to the current working directory.

conn = sqlite3.connect('countries_database')
c = conn.cursor()

c.execute('CREATE TABLE IF NOT EXISTS countries(Country text, Capital text, Population integer, Area integer)')
conn.commit()

df = pd.read_csv('countries.csv')
df.to_sql('countries', conn, if_exists='replace', index=False)
 
c.execute('SELECT * FROM countries')

for row in c.fetchall():
    print (row)
  • Line [1] creates a connection to the countries_database.
  • Line [2] executes the code on Line [1] and creates the Database.
  • Line [3] contains SQL code to create a database table containing the specified fields.
  • Line [4] executes the code on Line [3] and creates the table.
  • Line [5] reads in the countries.csv file to the DataFrame df.
  • Line [6] converts the DataFrame to SQL and inserts all records into the Database.
  • Line [7] executes the SQL SELECT command to retrieve all records from the Database.
  • Line [8-9] instantiates a for loop and displays one (1) record/line from the Database.

Output

(‘Germany’, ‘Berlin’, 83783942, 357021)
(‘France’, ‘Paris’, 67081000, 551695)
(‘Spain’, ‘Madrid’, 47431256, 498511)
(‘Italy’, ‘Rome’, 60317116, 301338)
(‘Poland’, ‘Warsaw’, 38383000, 312685)
(‘Russia’, ‘Moscow’, 146748590, 17098246)
(‘USA’, ‘Washington’, 328239523, 9833520)
(‘China’, ‘Beijing’, 1400050000, 9596961)
(‘India’, ‘Dheli’, 1352642280, 3287263)

πŸ’‘ Note: For additional details on SQL commands, click here.


DataFrame.to_csv()

The to_csv() method converts an object to a comma-separated values (CSV) file.

The syntax for this method is as follows:

DataFrame.to_csv(path_or_buf=None, sep=',', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression='infer', quoting=None, quotechar='"', line_terminator=None, chunksize=None, date_format=None, doublequote=True, escapechar=None, decimal='.', errors='strict', storage_options=None)
ParameterDescription
path_or_bufThis parameter is the file path to write the CSV to.
sepThis parameter is the field separator. The default is a comma (,).
na_repWhat string to replace any missing data.
float_formatThis is the format for floating numbers.
columnsThis parameter contains the field(s) to write to the CSV file.
headerIf True, the column names are written.
indexIf True, the index names are written.
index_labelThis parameter is the column name for the index label(s).
modeThis specified the file mode. By default, the mode is w,
encodingThis is a string representing the encoding for the CSV file.
compressionIf the infer option is stipulated, the available compressions are:
β€˜.gz’, β€˜.bz2’, β€˜.zip’, β€˜.xz’, or β€˜.zst’ extensions.
quotingIf a float_format is applied, float values convert to strings
quotecharThis is the character used for quote fields.
line_terminatorThe newline character to use in the CSV file.
Example: ’\n’ for Linux, β€˜\r\n’ for Windows.
chunksizeThis parameter denotes the number of rows to write at a time.
date_formatThis is the format for any dates
doublequoteThis parameter is the quoting inside a field.
escapecharThis is the character used to escape the sep and quotechar parameters.
decimalThis is the character used as a decimal separator.
errorsThis parameter determines how encoding and decoding errors are handled. Click here for details.
storage_optionsThis parameter contains extra options (dictionary format), such as host, port, username, etc.

This example saves a DataFrame of five (5) host city details for the Summer and Winter Olympic Games to a CSV file.

df = pd.DataFrame(({2010: ['Vancouver', 'Canada', 'North America'],
                    2012: ['London', 'United Kingdon', 'Europe'],
                    2014: ['Sochi', 'Russia', 'Europe',],
                    2016: ['Rio de Janeiro', 'Brazil', 'South America'],
                    2018: ['Pyeongchang', 'South Korea', 'Asia']}))
df.to_csv('games.csv', index=False)
  • Line [1] creates a DataFrame from a Dictionary of Lists. The output saves to df.
  • Line [2] creates a CSV file and saves it to the current working directory.

Output

πŸ’‘ Note: Saving to a CSV file removes all formatting.


DataFrame.to_excel()

The to_excel() method converts an object to an Excel file (XLSX).

The syntax for this method is as follows:

DataFrame.to_excel(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True, encoding=None, inf_rep='inf', verbose=True, freeze_panes=None, storage_options=None)
ParameterDescription
excel_writerThis parameter is the file path or the Excel writer.
sheet_nameThe name of the Excel sheet name containing the data.
na_repIf missing data is encountered, what should this be replaced with.
float_formatThis is the format for floating numbers.
columnsThis parameter contains the field(s) to write to the XLSX file.
headerIf True, the column names are written.
indexIf True, the index names are written.
index_labelThis parameter is the column name for the index label(s).
startrowThis parameter references the upper-left cell row to place the data.
startcolThis parameter references the upper-left cell column to place the data.
engineThis parameter is the engine to use, such as openpyxl or xlsxwriter.
merge_cellsThis parameter writes MultiIndex rows as merged cells.
encodingThis is a string representing the encoding for the XLSX file.
inf_repThis is the depiction of infinity.
verboseIf True, this will display additional details in the log file.
freeze_panesThis parameter depicts the bottom-most row and right-most column to freeze.
storage_optionsThis parameter contains extra options (Dictionary format), such as host, port, username, etc.

In this example, a DataFrame is created from a list of five (5) Classical Composers. This file converts and saves to an Excel (XLSX) file.

df = pd.DataFrame({'Mozart':   [1756, 1791],
                   'Bach':     [1685, 1750],
                   'Chopin':   [1810, 1849],
                   'Haydn':    [1732, 1809],
                   'Brahms':   [1833, 1897]},
                   index=['Born', 'Passed'])

df.to_excel('composers.xlsx', sheet_name='Classical') 
  • Line [1] creates a DataFrame from a Dictionary of Lists. This DataFrame saves to df.
  • Line [2] creates an XLSX file and saves it to the current working directory.

Output

πŸ’‘Β Note: Click here to view Finxter in-depth articles on Working with Excel.

Further Learning Resources

This is Part 21 of the DataFrame method series.

  • Part 1 focuses on the DataFrame methods abs(), all(), any(), clip(), corr(), and corrwith().
  • Part 2 focuses on the DataFrame methods count(), cov(), cummax(), cummin(), cumprod(), cumsum().
  • Part 3 focuses on the DataFrame methods describe(), diff(), eval(), kurtosis().
  • Part 4 focuses on the DataFrame methods mad(), min(), max(), mean(), median(), and mode().
  • Part 5 focuses on the DataFrame methods pct_change(), quantile(), rank(), round(), prod(), and product().
  • Part 6 focuses on the DataFrame methods add_prefix(), add_suffix(), and align().
  • Part 7 focuses on the DataFrame methods at_time(), between_time(), drop(), drop_duplicates() and duplicated().
  • Part 8 focuses on the DataFrame methods equals(), filter(), first(), last(), head(), and tail()
  • Part 9 focuses on the DataFrame methods equals(), filter(), first(), last(), head(), and tail()
  • Part 10 focuses on the DataFrame methods reset_index(), sample(), set_axis(), set_index(), take(), and truncate()
  • Part 11 focuses on the DataFrame methods backfill(), bfill(), fillna(), dropna(), and interpolate()
  • Part 12 focuses on the DataFrame methods isna(), isnull(), notna(), notnull(), pad() and replace()
  • Part 13 focuses on the DataFrame methods drop_level(), pivot(), pivot_table(), reorder_levels(), sort_values() and sort_index()
  • Part 14 focuses on the DataFrame methods nlargest(), nsmallest(), swap_level(), stack(), unstack() and swap_axes()
  • Part 15 focuses on the DataFrame methods melt(), explode(), squeeze(), to_xarray(), t() and transpose()
  • Part 16 focuses on the DataFrame methods append(), assign(), compare(), join(), merge() and update()
  • Part 17 focuses on the DataFrame methods asfreq(), asof(), shift(), slice_shift(), tshift(), first_valid_index(), and last_valid_index()
  • Part 18 focuses on the DataFrame methods resample(), to_period(), to_timestamp(), tz_localize(), and tz_convert()
  • Part 19 focuses on the visualization aspect of DataFrames and Series via plotting, such as plot(), and plot.area().
  • Part 20 focuses on continuing the visualization aspect of DataFrames and Series via plotting such as hexbin, hist, pie, and scatter plots.
  • Part 21 focuses on the serialization and conversion methods from_dict(), to_dict(), from_records(), to_records(), to_json(), and to_pickles().
  • Part 22 focuses on the serialization and conversion methods to_clipboard(), to_html(), to_sql(), to_csv(), and to_excel().
  • Part 23 focuses on the serialization and conversion methods to_markdown(), to_stata(), to_hdf(), to_latex(), to_xml().
  • Part 24 focuses on the serialization and conversion methods to_parquet(), to_feather(), to_string(), Styler.
  • Part 25 focuses on the serialization and conversion methods to_bgq() and to_coo().