Problem Formulation
Say, you have a function create_df()
that creates a DataFrame from some data that may be read from a CSV file, the web, or from another variable in the code (e.g., a nested list)—doesn’t matter.
You want to return the DataFrame to the caller of the function at runtime.
β‘ However, the code doesn’t work, and the DataFrame doesn’t seem to get returned by the function. π€―
See the following code example:
import pandas as pd def create_df(): data = {'name': ['Alice', 'Bob', 'Carl', 'Dave'], 'age': [18, 22, 34, 67], 'income': [100000, 99000, 24000, 44000]} df = pd.DataFrame(data) return df create_df() print(df)
For instance, this leads to the NameError: name 'df' is not defined
:
Traceback (most recent call last): File "C:\Users\xcent\Desktop\code.py", line 13, in <module> print(df) NameError: name 'df' is not defined
How to fix it?
Solution 1: Create New Global Variable Outside Function Scope
The NameError: name 'df' is not defined
occurs because variable df
is a local variable that is only visible within the function. Therefore, you cannot directly access it from outside the function scope.
To fix it, assign the result of the function call to the variable such as df = create_df()
to create another global variable that is now visible from outside the function scope.
π Recommended Tutorial: Python Namespaces Made Simple
So, here’s the corrected example with the fixed line highlighted in the code:
import pandas as pd def create_df(): data = {'name': ['Alice', 'Bob', 'Carl', 'Dave'], 'age': [18, 22, 34, 67], 'income': [100000, 99000, 24000, 44000]} df = pd.DataFrame(data) return df df = create_df() print(df)
Now, the code works perfectly fine and produces the following output:
name age income 0 Alice 18 100000 1 Bob 22 99000 2 Carl 34 24000 3 Dave 67 44000
Solution 2: Create Global Variable Inside Function Scope
The previous solution has created a new global variable outside the function scope and assigned the DataFrame created inside the function to that new global variable.
An alternative solution is to make the local variable df
, created inside the function scope, a global variable using the expression global df
inside the function. Now, you don’t need to return the DataFrame from the variable df
is already visible from outside the function as well.
Here’s the example code:
import pandas as pd def create_df(): data = {'name': ['Alice', 'Bob', 'Carl', 'Dave'], 'age': [18, 22, 34, 67], 'income': [100000, 99000, 24000, 44000]} global df df = pd.DataFrame(data) create_df() print(df)
However, I don’t recommend this approach, and I don’t find it very Pythonic. Your code is far easier to understand if you create a function that returns a DataFrame rather than creating a function that messes with the global variables.
π‘ As a rule of thumb: Functions should never have side effects!
In case you struggle with writing clean code, I’d recommend you check out my book on the topic:
The Art of Clean Code
Most software developers waste thousands of hours working with overly complex code. The eight core principles in The Art of Clean Coding will teach you how to write clear, maintainable code without compromising functionality. The bookβs guiding principle is simplicity: reduce and simplify, then reinvest energy in the important parts to save you countless hours and ease the often onerous task of code maintenance.
- Concentrate on the important stuff with the 80/20 principle — focus on the 20% of your code that matters most
- Avoid coding in isolation: create a minimum viable product to get early feedback
- Write code cleanly and simply to eliminate clutter
- Avoid premature optimization that risks over-complicating code
- Balance your goals, capacity, and feedback to achieve the productive state of Flow
- Apply the Do One Thing Well philosophy to vastly improve functionality
- Design efficient user interfaces with the Less is More principle
- Tie your new skills together into one unifying principle: Focus
The Python-based The Art of Clean Coding is suitable for programmers at any level, with ideas presented in a language-agnostic manner.
Conclusion and Further Reading
Note that there are many ways to create a DataFrame inside the function body. For this, I’d refer you to our in-depth tutorials on the Finxter blog:
- How to Create a DataFrame in Python?
- How to Read a DataFrame from a CSV file?
- How to Read a DataFrame from an HTML table?
You can also create a copy of an existing DataFrame by using the df.copy()
functionality. This way, you can create a new DataFrame and return it from the function by copying an existing one.
new_df = df.copy()
Thanks for reading through the whole article and feel free to join my email academy to keep learning Python, data science, crypto programming and Blockchain engineering.