Preparation
Before any data manipulation can occur, four (4) new libraries will require installation.
- The Pandas library enables access to/from a DataFrame.
- The Tabulate library enables formatted output.
- The Tables library allows formatted output (table format).
- The lxml library enables writing to an XML file.
To install these libraries, navigate to an IDE terminal. At the command prompt ($
), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($
). Your terminal prompt may be different.
$ pip install pandas
Hit the <Enter> key on the keyboard to start the installation process.
$ pip install tabulate
Hit the <Enter> key on the keyboard to start the installation process.
$ pip install tables
Hit the <Enter> key on the keyboard to start the installation process.
$ pip install lxml
Hit the <Enter> key on the keyboard to start the installation process.
If the installations were successful, a message displays in the terminal indicating the same.
Feel free to view the PyCharm installation guide for the required libraries.
- How to install Pandas on PyCharm
- How to install Tabulate on PyCharm
- How to install Tables on PyCharm
- How to install lxml on PyCharm
Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.
import pandas as pd import tabulate import tables import lxml
DataFrame.to_hdf()
The to_hdf()
method writes data to a Hierarchical Data Format (HDF) file. This format can hold a mixture of objects accessed individually or by a group.
The syntax for this method is as follows:
DataFrame.to_hdf(path_or_buf, key, mode='a', complevel=None, complib=None, append=False, format=None, index=True, min_itemsize=None, nan_rep=None, dropna=None, data_columns=None, errors='strict', encoding='UTF-8')
Parameter | Description |
---|---|
path_or_buf | This parameter is the file path /HDFStore to write. If empty, a string returns. |
key | This depicts the identifier for the group in the HDFStore . |
mode | The mode to use to open a file. The options are: 'a', 'w', 'r+' . The default mode is 'a' (append). |
complevel | This parameter sets the compression level (0-9). Zero disables compression. |
complib | Specifies the compression method to use: 'zlib', 'lzo', 'bzip2', 'blosc' . The default compression is 'zlib' . |
append | If True and format is 'table' , it appends the input data to the existing table. |
format | The available format options are: – 'fixed' : A fixed format that does not allow appends/searches.– 'table' : Writes to a table. This option has appends/searches.– None : Falls to fixed or pd.get_option('io.hdf.default_format') . |
errors | Depict how errors are determined. The default value is 'strict' . |
min_itemsize | A dictionary containing column names to min. string sizes. |
nan_rep | Depicts how to represent NULL values as a string. This option is not permitted if the append parameter is True . |
data_columns | This parameter is a column list for indexed data. This option is available if the format is 'table' . |
encoding | Depicts the encoding. The default value is 'UTF-8' . |
This example creates a DataFrame with the Host City details for the previous five (5) Summer and Winter Olympic Games.
df = pd.DataFrame(({2010: ['Vancouver', 'Canada', 'North America'], 2012: ['London', 'United Kingdon', 'Europe'], 2014: ['Sochi', 'Russia', 'Europe',], 2016: ['Rio de Janeiro', 'Brazil', 'South America'], 2018: ['Pyeongchang', 'South Korea', 'Asia']})) df.to_hdf('olympics.h5', key='Games', mode='w', format='table') print(pd.read_hdf('olympics.h5', 'Games'))
- Line [1] creates a DataFrame from a dictionary of lists. The output saves to
df
. - Line [2] does the following:
- creates an h5 file
- sets the key to Games
- sets the file mode to
w
(write mode) - sets the output to a table format
- saves the output to
olympics.h5
- Line [3] reads in and displays the contents of the
olympics.h5
file.
Output
2010 | 2012 | 2014 | 2016 | 2018 | |
0 | Vancouver | London | Sochi | Rio de Janeiro | Pyeongchang |
1 | Canada | United Kingdon | Russia | Brazil | South Korea |
2 | North America | Europe | Europe | South America | Asia |
π‘Β Note: If you navigate to the current working directory, the olympics.h5
file resides in the file list.
More Pandas DataFrame Methods
Feel free to learn more about the previous and next pandas DataFrame methods (alphabetically) here:
Also, check out the full cheat sheet overview of all Pandas DataFrame methods.