Pandas DataFrame to_gbq() Method

This article focuses on the serialization and conversion methods of a Python DataFrame:

  • to_gbq(),
  • to_coo().

Let’s get started!


Preparation

Before any data manipulation can occur, four (4) new libraries will require installation.

  • The Pandas library enables access to/from a DataFrame.
  • The NumPy library supports multi-dimensional arrays and matrices in addition to a collection of mathematical functions.
  • The pandas_gbq allows access to Google Big Query (GBQ)
  • The google.auth authentication.

To install these libraries, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install pandas_gbq 

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install numpy

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install google.auth

Hit the <Enter> key on the keyboard to start the installation process.

If the installations were successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guide for the required libraries.


Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd
import numpy as np 
from google.cloud import bigquery
import google.auth

DataFrame.to_gbq()

The to_gbq() method writes a DataFrame object to a Google BigQuery table.

BigQuery is Google Cloud Data Warehouse that helps you store, analyze and visualize data.

The syntax for this method is as follows:

DataFrame.to_gbq(destination_table, project_id=None, chunksize=None, reauth=False, if_exists='fail', auth_local_webserver=False, table_schema=None, location=None, progress_bar=True, credentials=None)
ParameterDescription
destination_tableThis string parameter denotes the table name presented in the form dataset.tablename.
project_idThis parameter is the GBQ Account Project ID.
This ID is only available from the GBG environment.
chunksizeThis parameter depicts the total rows to insert/chunk. The default value is all rows at once.
reauthIf True, request user re-authentication from GBQ.
if_existsThe path taken is the table exists. Options are:
'fail': if the table exists, raise an error.
'replace': if the table exists, drop and re-create.
'append': if the table exists, append data.
auth_local_webserverIf True, use the local webserver flow instead of the console flow to retrieve user credentials.
table_schemaThis parameter is a dictionary containing table fields data types (Schema). If not provided, BGQ will attempt to create a Schema.
locationThis parameter is the location where the load job runs.
progress_barIf True, use the tqdm library to display the progress bar for the upload.
credentialsThis parameter is the credentials for accessing Google APIs.

The to_gbq() determines the BigQuery Table Schema based on the data types of the uploaded DataFrame. If this Schema does not meet your requirements, you can create a new Table Schema and pass it via the table_schema parameter.

πŸ’‘ Note: To view a list of available GBQ Data Types, click here.

Before running any code, you will need to do the following:

View the video below for instructions.

πŸ’‘ Note: Remember to take note of the project details and authentication.

For this example, countries.csv reads into a DataFrame. This DataFrame adds to a Google Big Query Table. Click here to save this CSV file and move it to the current working directory.

prj_id   = 'central-hangar-341214'
dat_set  = 'my_dataset'
table_id = 'my_dataset.countries'

df = pd.read_csv('countries.csv')
df.to_gbq(table_id, if_exists='append', project_id=prj_id)
  • Line [1] contains the Project ID assigned earlier.
  • Line [2] is a name we assigned to the dataset.
  • Line [3] combines the dataset name added to the table name. For this example, 'countries' is entered.
  • Line [4] reads in the countries.csv to a DataFrame.
  • Line [5] does the following:
    • Assigns the table_id ('countries.csv')
    • Checks if this table already exists. If so, the data appends to it.
    • Assigns the project_id ('central-hangar-341214')
    • Attempts to connect to and upload the data

πŸ’‘ Note: Remember to generate and save the authentication.


More Pandas DataFrame Methods

Feel free to learn more about the previous and next pandas DataFrame methods (alphabetically) here:

Also, check out the full cheat sheet overview of all Pandas DataFrame methods.