Pandas DataFrame to_hdf() Method


Preparation

Before any data manipulation can occur, four (4) new libraries will require installation.

  • The Pandas library enables access to/from a DataFrame.
  • The Tabulate library enables formatted output.
  • The Tables library allows formatted output (table format).
  • The lxml library enables writing to an XML file.

To install these libraries, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install tabulate

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install tables

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install lxml

Hit the <Enter> key on the keyboard to start the installation process.

If the installations were successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guide for the required libraries.


Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd
import tabulate
import tables
import lxml

DataFrame.to_hdf()

The to_hdf() method writes data to a Hierarchical Data Format (HDF) file. This format can hold a mixture of objects accessed individually or by a group.

The syntax for this method is as follows:

DataFrame.to_hdf(path_or_buf, key, mode='a', complevel=None, complib=None, append=False, format=None, index=True, min_itemsize=None, nan_rep=None, dropna=None, data_columns=None, errors='strict', encoding='UTF-8')
ParameterDescription
path_or_bufThis parameter is the file path/HDFStore to write. If empty, a string returns.
keyThis depicts the identifier for the group in the HDFStore.
modeThe mode to use to open a file. The options are: 'a', 'w', 'r+'. The default mode is 'a' (append).
complevelThis parameter sets the compression level (0-9).
Zero disables compression.
complibSpecifies the compression method to use: 'zlib', 'lzo', 'bzip2', 'blosc'. The default compression is 'zlib'.
appendIf True and format is 'table', it appends the input data to the existing table.
formatThe available format options are:
'fixed': A fixed format that does not allow appends/searches.
'table': Writes to a table. This option has appends/searches.
None: Falls to fixed or pd.get_option('io.hdf.default_format').
errorsDepict how errors are determined. The default value is 'strict'.
min_itemsizeA dictionary containing column names to min. string sizes.
nan_repDepicts how to represent NULL values as a string.
This option is not permitted if the append parameter is True.
data_columnsThis parameter is a column list for indexed data.
This option is available if the format is 'table'.
encodingDepicts the encoding. The default value is 'UTF-8'.

This example creates a DataFrame with the Host City details for the previous five (5) Summer and Winter Olympic Games.

df = pd.DataFrame(({2010: ['Vancouver', 'Canada', 'North America'],
                    2012: ['London', 'United Kingdon', 'Europe'],
                    2014: ['Sochi', 'Russia', 'Europe',],
                    2016: ['Rio de Janeiro', 'Brazil', 'South America'],
                    2018: ['Pyeongchang', 'South Korea', 'Asia']}))

df.to_hdf('olympics.h5', key='Games', mode='w', format='table')
print(pd.read_hdf('olympics.h5', 'Games'))
  • Line [1] creates a DataFrame from a dictionary of lists. The output saves to df.
  • Line [2] does the following:
    • creates an h5 file
    • sets the key to Games
    • sets the file mode to w (write mode)
    • sets the output to a table format
    • saves the output to olympics.h5
  • Line [3] reads in and displays the contents of the olympics.h5 file.

Output

20102012201420162018
0VancouverLondonSochiRio de JaneiroPyeongchang
1CanadaUnited KingdonRussiaBrazilSouth Korea
2North AmericaEuropeEuropeSouth AmericaAsia

πŸ’‘Β Note: If you navigate to the current working directory, the olympics.h5 file resides in the file list.

More Pandas DataFrame Methods

Feel free to learn more about the previous and next pandas DataFrame methods (alphabetically) here:

Also, check out the full cheat sheet overview of all Pandas DataFrame methods.