0 1 2 0 'Name' 'Age' 'City' 1 'John' 28 'New York' 2 'Anna' 32 'Berlin'The desired output would look like:
Name Age City 0 'John' 28 'New York' 1 'Anna' 32 'Berlin'
Method 1: The pd.read_csv()
Function with header
Argument
When loading data into a pandas DataFrame using pd.read_csv()
, the header
argument can be used to specify which row should be used as the header. By default, pandas will use the first row (row 0) as the header, but setting header=None
allows us to manually set the first row as header using the names
parameter.Here’s an example:
import pandas as pd from io import StringIO data = 'Name,Age,City\nJohn,28,New York\nAnna,32,Berlin' df = pd.read_csv(StringIO(data), header=None, names=['Name', 'Age', 'City']) print(df)The output would be:
Name Age City 0 'John' 28 'New York' 1 'Anna' 32 'Berlin'In this example, we simulated reading a CSV file from a string. By setting
header=None
and supplying column names with the names
parameter, we manually assigned a header to the DataFrame.Method 2: Using DataFrame.columns
Property
After creating a DataFrame without a header, we can subsequently assign the first row to be the header by setting the DataFrame.columns
property with the data from the first row and then removing that row from the DataFrame.Here’s an example:
import pandas as pd data = [['Name', 'Age', 'City'], ['John', 28, 'New York'], ['Anna', 32, 'Berlin']] df = pd.DataFrame(data[1:], columns=data[0]) print(df)The output would be:
Name Age City 0 'John' 28 'New York' 1 'Anna' 32 'Berlin'This snippet creates a DataFrame, assigns the first item in the list as header using the
columns
parameter, and then slices the list to remove the header row, keeping only the data rows.Method 3: The DataFrame.rename()
Method
Another method involves using the DataFrame.rename()
function. By passing a mapping dictionary to the columns
parameter, where each key-value pair corresponds to the old and new column names respectively, we can rename the DataFrame’s columns according to the first row’s values.Here’s an example:
import pandas as pd data = [['John', 28, 'New York'], ['Anna', 32, 'Berlin']] headers = ['Name', 'Age', 'City'] df = pd.DataFrame(data) df.rename(columns=dict(zip(df.columns, headers)), inplace=True) print(df)The output would be:
Name Age City 0 'John' 28 'New York' 1 'Anna' 32 'Berlin'Here, the
rename()
method is handed a dictionary that zips together the current columns of the DataFrame with the new headers. The inplace=True
flag applies the renaming directly to the existing DataFrame.Method 4: The DataFrame.iloc[]
Method
The DataFrame.iloc[]
indexer can be used to select the first row and set it as the header. Post this, the row can be dropped from the DataFrame to clean up the data.Here’s an example:
import pandas as pd data = [['Name', 'Age', 'City'], ['John', 28, 'New York'], ['Anna', 32, 'Berlin']] df = pd.DataFrame(data) df.columns = df.iloc[0] df = df[1:] print(df)The output would be:
Name Age City 1 'John' 28 'New York' 2 'Anna' 32 'Berlin'In this code,
df.iloc[0]
retrieves the first row, which is applied to the columns attribute. The DataFrame is then reassigned to itself excluding the first row, yielding a cleaned DataFrame with headers.Bonus One-Liner Method 5: The header=0
Argument in read_csv()
When importing a CSV file where the first row is intended as the header, simply use pd.read_csv()
with the default header=0
argument for an immediate solution. It instructs pandas to automatically take the first row as the header.Here’s an example:
import pandas as pd from io import StringIO data = 'Name,Age,City\nJohn,28,New York\nAnna,32,Berlin' df = pd.read_csv(StringIO(data)) print(df)The output would be:
Name Age City 0 'John' 28 'New York' 1 'Anna' 32 'Berlin'By default,
pd.read_csv()
takes the first row as the header which simplifies the process if the data source has the header row in the correct position.Summary/Discussion
- Method 1:
pd.read_csv()
withheader=None
. Strengths: Directly sets header from CSV read. Weaknesses: Only applicable when reading from CSV, not for existing DataFrames. - Method 2:
DataFrame.columns
Property. Strengths: Simple and explicit. Weaknesses: Slightly manual process, involves data slicing. - Method 3:
DataFrame.rename()
Method. Strengths: Offers flexibility for selective renaming. Weaknesses: Verbose for a simple header replacement. - Method 4:
DataFrame.iloc[]
Method. Strengths: Fast and efficient on existing DataFrames. Weaknesses: Includes row deletion step. - Method 5:
header=0
inpd.read_csv()
. Strengths: Extremely concise for CSV reads. Weaknesses: Not applicable for non-CSV data or pre-existing DataFrames.