[Fixed] Return DataFrame From Python Function? Do This!

3.7/5 - (3 votes)

Problem Formulation

Say, you have a function create_df() that creates a DataFrame from some data that may be read from a CSV file, the web, or from another variable in the code (e.g., a nested list)—doesn’t matter.

You want to return the DataFrame to the caller of the function at runtime.

⚑ However, the code doesn’t work, and the DataFrame doesn’t seem to get returned by the function. 🀯

See the following code example:

import pandas as pd


def create_df():
    data = {'name': ['Alice', 'Bob', 'Carl', 'Dave'],
            'age': [18, 22, 34, 67],
            'income': [100000, 99000, 24000, 44000]}
    df = pd.DataFrame(data)
    return df


create_df()
print(df)

For instance, this leads to the NameError: name 'df' is not defined:

Traceback (most recent call last):
  File "C:\Users\xcent\Desktop\code.py", line 13, in <module>
    print(df)
NameError: name 'df' is not defined

How to fix it?

Solution 1: Create New Global Variable Outside Function Scope

The NameError: name 'df' is not defined occurs because variable df is a local variable that is only visible within the function. Therefore, you cannot directly access it from outside the function scope.

To fix it, assign the result of the function call to the variable such as df = create_df() to create another global variable that is now visible from outside the function scope.

πŸ‘‰ Recommended Tutorial: Python Namespaces Made Simple

So, here’s the corrected example with the fixed line highlighted in the code:

import pandas as pd


def create_df():
    data = {'name': ['Alice', 'Bob', 'Carl', 'Dave'],
            'age': [18, 22, 34, 67],
            'income': [100000, 99000, 24000, 44000]}
    df = pd.DataFrame(data)
    return df


df = create_df()
print(df)

Now, the code works perfectly fine and produces the following output:

    name  age  income
0  Alice   18  100000
1    Bob   22   99000
2   Carl   34   24000
3   Dave   67   44000

Solution 2: Create Global Variable Inside Function Scope

The previous solution has created a new global variable outside the function scope and assigned the DataFrame created inside the function to that new global variable.

An alternative solution is to make the local variable df, created inside the function scope, a global variable using the expression global df inside the function. Now, you don’t need to return the DataFrame from the variable df is already visible from outside the function as well.

Here’s the example code:

import pandas as pd


def create_df():
    data = {'name': ['Alice', 'Bob', 'Carl', 'Dave'],
            'age': [18, 22, 34, 67],
            'income': [100000, 99000, 24000, 44000]}

    global df
    df = pd.DataFrame(data)


create_df()
print(df)

However, I don’t recommend this approach, and I don’t find it very Pythonic. Your code is far easier to understand if you create a function that returns a DataFrame rather than creating a function that messes with the global variables.

πŸ’‘ As a rule of thumb: Functions should never have side effects!

In case you struggle with writing clean code, I’d recommend you check out my book on the topic:


The Art of Clean Code

Most software developers waste thousands of hours working with overly complex code. The eight core principles in The Art of Clean Coding will teach you how to write clear, maintainable code without compromising functionality. The book’s guiding principle is simplicity: reduce and simplify, then reinvest energy in the important parts to save you countless hours and ease the often onerous task of code maintenance.

  1. Concentrate on the important stuff with the 80/20 principle — focus on the 20% of your code that matters most
  2. Avoid coding in isolation: create a minimum viable product to get early feedback
  3. Write code cleanly and simply to eliminate clutter 
  4. Avoid premature optimization that risks over-complicating code 
  5. Balance your goals, capacity, and feedback to achieve the productive state of Flow
  6. Apply the Do One Thing Well philosophy to vastly improve functionality
  7. Design efficient user interfaces with the Less is More principle
  8. Tie your new skills together into one unifying principle: Focus

The Python-based The Art of Clean Coding is suitable for programmers at any level, with ideas presented in a language-agnostic manner.


Conclusion and Further Reading

Note that there are many ways to create a DataFrame inside the function body. For this, I’d refer you to our in-depth tutorials on the Finxter blog:

You can also create a copy of an existing DataFrame by using the df.copy() functionality. This way, you can create a new DataFrame and return it from the function by copying an existing one.

new_df = df.copy()

Thanks for reading through the whole article and feel free to join my email academy to keep learning Python, data science, crypto programming and Blockchain engineering.