Preparation
Before any data manipulation can occur, four (4) new libraries will require installation.
- The Pandas library enables access to/from a DataFrame.
- The Tabulate library enables formatted output.
- The Tables library allows formatted output (table format).
- The lxml library enables writing to an XML file.
To install these libraries, navigate to an IDE terminal. At the command prompt ($
), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($
). Your terminal prompt may be different.
$ pip install pandas
Hit the <Enter> key on the keyboard to start the installation process.
$ pip install tabulate
Hit the <Enter> key on the keyboard to start the installation process.
$ pip install tables
Hit the <Enter> key on the keyboard to start the installation process.
$ pip install lxml
Hit the <Enter> key on the keyboard to start the installation process.
If the installations were successful, a message displays in the terminal indicating the same.
Feel free to view the PyCharm installation guide for the required libraries.
- How to install Pandas on PyCharm
- How to install Tabulate on PyCharm
- How to install Tables on PyCharm
- How to install lxml on PyCharm
Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.
import pandas as pd import tabulate import tables import lxml
DataFrame.to_xml()
The to_xml()
method converts a DataFrame object into a valid XML format.
The syntax for this method is as follows:
DataFrame.to_xml(path_or_buffer=None, index=True, root_name='data', row_name='row', na_rep=None, attr_cols=None, elem_cols=None, namespaces=None, prefix=None, encoding='utf-8', xml_declaration=True, pretty_print=True, parser='lxml', stylesheet=None, compression='infer', storage_options=None)
Parameter | Description |
---|---|
path_or_buffer | This parameter is the file/string to write. If empty, a string returns. |
| If True , includes the index in the XML document. |
root_name | This parameter is the root name of the XML document. |
| This parameter is the name of row elements in the XML document. |
na_rep | This is a string representation of any missing data. |
attr_cols | This is a column list to write as row element attributes. |
elem_cols | This is a column list to write as child-row elements. |
namespaces | This parameter is the namespaces defined in the root element. |
prefix | This is a prefix for the namespace for each element/attribute. |
encoding | This is the encoding of the XML document. The default is UTF-8. |
xml_declaration | If True , include the XML declaration at the top of the document. |
pretty_print | If True , the XML outputs with indentation and line breaks. |
parser | This is the parser module for the building of a tree. The lxml and etree are supported. |
stylesheet | A URL, file, or string containing an XSLT script for formatting the XML output. |
compression | If infer is selected, the options are:'.gz', '.bz2', '.zip', '.xz' , or '.zst' extensions. |
storage_options | This parameter contains extra options (dictionary format), such as host, port, username, etc. |
This example reads in the countries.csv
file and saves the same to an XML file. Click here to save this CSV file and move it to the current working directory.
df = pd.read_csv('countries.csv') df.to_xml('countries.xml', row_name='country', pretty_print=True)
- Line [1] reads in the comma-delimited CSV file and saves it to
df
. - Line [2] creates an XML file with the following options:
- adds <country></country> tags around each country (row)
- prints to the XML file with the appropriate indents and line break.
Output (partial)
π‘Β Note: Click here to validate your XML.
More Pandas DataFrame Methods
Feel free to learn more about the previous and next pandas DataFrame methods (alphabetically) here:
Also, check out the full cheat sheet overview of all Pandas DataFrame methods.