Pandas DataFrame to_hdf() Method - Be on the Right Side of Change

Preparation

Before any data manipulation can occur, four (4) new libraries will require installation.

The Pandas library enables access to/from a DataFrame.
The Tabulate library enables formatted output.
The Tables library allows formatted output (table format).
The lxml library enables writing to an XML file.

To install these libraries, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install tabulate

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install tables

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install lxml

Hit the <Enter> key on the keyboard to start the installation process.

If the installations were successful, a message displays in the terminal indicating the same.

Feel free to view the PyCharm installation guide for the required libraries.

Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd
import tabulate
import tables
import lxml

DataFrame.to_hdf()

The to_hdf() method writes data to a Hierarchical Data Format (HDF) file. This format can hold a mixture of objects accessed individually or by a group.

The syntax for this method is as follows:

DataFrame.to_hdf(path_or_buf, key, mode='a', complevel=None, complib=None, append=False, format=None, index=True, min_itemsize=None, nan_rep=None, dropna=None, data_columns=None, errors='strict', encoding='UTF-8')

Parameter	Description
`path_or_buf`	This parameter is the file `path`/`HDFStore` to write. If empty, a string returns.
`key`	This depicts the identifier for the group in the `HDFStore`.
`mode`	The mode to use to open a file. The options are: `'a', 'w', 'r+'`. The default mode is `'a'` (append).
`complevel`	This parameter sets the compression level (0-9). Zero disables compression.
`complib`	Specifies the compression method to use: `'zlib', 'lzo', 'bzip2', 'blosc'`. The default compression is `'zlib'`.
`append`	If `True` and format is `'table'`, it appends the input data to the existing table.
`format`	The available format options are: – `'fixed'`: A fixed format that does not allow appends/searches. – `'table'`: Writes to a table. This option has appends/searches. – `None`: Falls to fixed or `pd.get_option('io.hdf.default_format')`.
`errors`	Depict how errors are determined. The default value is `'strict'`.
`min_itemsize`	A dictionary containing column names to min. string sizes.
`nan_rep`	Depicts how to represent NULL values as a string. This option is not permitted if the append parameter is `True`.
`data_columns`	This parameter is a column list for indexed data. This option is available if the format is `'table'`.
`encoding`	Depicts the encoding. The default value is `'UTF-8'`.

This example creates a DataFrame with the Host City details for the previous five (5) Summer and Winter Olympic Games.

df = pd.DataFrame(({2010: ['Vancouver', 'Canada', 'North America'],
                    2012: ['London', 'United Kingdon', 'Europe'],
                    2014: ['Sochi', 'Russia', 'Europe',],
                    2016: ['Rio de Janeiro', 'Brazil', 'South America'],
                    2018: ['Pyeongchang', 'South Korea', 'Asia']}))

df.to_hdf('olympics.h5', key='Games', mode='w', format='table')
print(pd.read_hdf('olympics.h5', 'Games'))

Line [1] creates a DataFrame from a dictionary of lists. The output saves to df.
Line [2] does the following:
- creates an h5 file
- sets the key to Games
- sets the file mode to w (write mode)
- sets the output to a table format
- saves the output to olympics.h5
Line [3] reads in and displays the contents of the olympics.h5 file.

Output

	2010	2012	2014	2016	2018
0	Vancouver	London	Sochi	Rio de Janeiro	Pyeongchang
1	Canada	United Kingdon	Russia	Brazil	South Korea
2	North America	Europe	Europe	South America	Asia

💡 Note: If you navigate to the current working directory, the olympics.h5 file resides in the file list.

More Pandas DataFrame Methods

Feel free to learn more about the previous and next pandas DataFrame methods (alphabetically) here:

👈 df.to_feather()

df.to_html() 👉

Also, check out the full cheat sheet overview of all Pandas DataFrame methods.