Before any data manipulation can occur, four (4) new libraries will require installation.
- The Pandas library enables access to/from a DataFrame.
- The Tabulate library enables formatted output.
- The Tables library allows formatted output (table format).
- The lxml library enables writing to an XML file.
To install these libraries, navigate to an IDE terminal. At the command prompt ($
), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($
). Your terminal prompt may be different.
$ pip install pandas
Hit the <Enter> key on the keyboard to start the installation process.
$ pip install tabulate
Hit the <Enter> key on the keyboard to start the installation process.
$ pip install tables
Hit the <Enter> key on the keyboard to start the installation process.
$ pip install lxml
Hit the <Enter> key on the keyboard to start the installation process.
If the installations were successful, a message displays in the terminal indicating the same.
Feel free to view the PyCharm installation guide for the required libraries.
- How to install Pandas on PyCharm
- How to install Tabulate on PyCharm
- How to install Tables on PyCharm
- How to install lxml on PyCharm
Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.
import pandas as pd import tabulate import tables import lxml
The to_stata()
method converts a DataFrame object to a database-like format (.dat
) file.
The syntax for this method is as follows:
DataFrame.to_stata(path, convert_dates=None, write_index=True, byteorder=None, time_stamp=None, data_label=None, variable_labels=None, version=114, convert_strl=None, compression='infer', storage_options=None, *, value_labels=None)
Parameter | Description |
path | This parameter is the string path to write. If empty, a string returns. |
convert_dates | The date conversion method. The options are: 'tc', 'td', 'tm', 'tw', 'th', 'tq', 'ty' . By default, 'tc' is set. |
write_index | If True , write the index to the Stata dataset. |
byteorder | This parameter can be: '<', '>', 'little' , or 'big' . The default is sys.byteorder . |
time_stamp | This parameter is the datetime to use as the date created. Default is the current time. |
data_label | This is the label for the dataset. The maximum length is 80 characters. |
variable_labels | This is a dictionary with columns as keys and labels as values. The maximum length is 80 characters. |
version | This is the version to use in the output (.dta ) file. |
convert_strl | This parameter is a list containing column names to convert to Stata StrL format. |
compression | If infer is selected, the options are:'.gz', '.bz2', '.zip', '.xz', or '.zst' extensions. |
storage_options | This parameter contains extra options (dictionary format), such as host, port, username, etc. |
value_labels | A dictionary with columns as keys and dictionaries of column values. |
This example reads in the first five (5) rows of the Periodic Table CSV file to a Stata dataset. Click here to save this CSV file and move it to the current working directory.
df = pd.read_csv('PubChemElements_all.csv', usecols=['AtomicNumber', 'Symbol', 'Name', 'YearDiscovered']).head() print(df) df.to_stata('elements.dta')
- Line [1] does the following:
- reads in the first five (5) rows (head) of the CSV file
- selects the columns to display
- saves the output to the DataFrame
- Line [2] outputs the DataFrame to the terminal.
- Line [3] outputs the DataFrame to a Stata dataset file.
Atomic Number | Symbol | Name | Year Discovered | |
0 | 1 | H | Hydrogen | 1766 |
1 | 2 | He | Helium | 1868 |
2 | 3 | Li | Lithium | 1817 |
3 | 4 | Be | Beryllium | 1798 |
4 | 5 | B | Boron | 1808 |
π‘Β Note: If you navigate to the current working directory, the elements.dta
file resides in the file list.
More Pandas DataFrame Methods
Feel free to learn more about the previous and next pandas DataFrame methods (alphabetically) here:
Also, check out the full cheat sheet overview of all Pandas DataFrame methods.