Quick Fix: Python raises the
ImportError: No module named 'pyspark' when it cannot find the library
pyspark. The most frequent source of this error is that you haven’t installed
pyspark explicitly with
pip install pyspark. Alternatively, you may have different Python versions on your computer, and
pyspark is not installed for the particular version you’re using.
You’ve just learned about the awesome capabilities of the
pyspark library and you want to try it out, so you start your code with the following statement:
This is supposed to import the Pandas library into your (virtual) environment. However, it only throws the following
ImportError: No module named pyspark:
>>> import pyspark Traceback (most recent call last): File "<pyshell#6>", line 1, in <module> import pyspark ModuleNotFoundError: No module named 'pyspark'
Solution Idea 1: Install Library pyspark
The most likely reason is that Python doesn’t provide
pyspark in its standard library. You need to install it first!
Before being able to import the Pandas module, you need to install it using Python’s package manager
pip. Make sure pip is installed on your machine.
To fix this error, you can run the following command in your Windows shell:
$ pip install pyspark
This simple command installs
pyspark in your virtual environment on Windows, Linux, and MacOS. It assumes that your
pip version is updated. If it isn’t, use the following two commands in your terminal, command line, or shell (there’s no harm in doing it anyways):
$ python -m pip install --upgrade pip $ pip install pandas
💡 Note: Don’t copy and paste the
$ symbol. This is just to illustrate that you run it in your shell/terminal/command line.
Solution Idea 2: Fix the Path
The error might persist even after you have installed the
pyspark library. This likely happens because
pip is installed but doesn’t reside in the path you can use. Although
pip may be installed on your system the script is unable to locate it. Therefore, it is unable to install the library using
pip in the correct path.
To fix the problem with the path in Windows follow the steps given next.
Step 1: Open the folder where you installed Python by opening the command prompt and typing
Step 2: Once you have opened the
Python folder, browse and open the
Scripts folder and copy its location. Also verify that the folder contains the
Step 3: Now open the
Scripts directory in the command prompt using the
cd command and the location that you copied previously.
Step 4: Now install the library using
pip install pyspark command. Here’s an analogous example:
After having followed the above steps, execute our script once again. And you should get the desired output.
Other Solution Ideas
ModuleNotFoundErrormay appear due to relative imports. You can learn everything about relative imports and how to create your own module in this article.
- You may have mixed up Python and pip versions on your machine. In this case, to install
pysparkfor Python 3, you may want to try
python3 -m pip install pysparkor even
pip3 install pysparkinstead of
pip install pyspark
- If you face this issue server-side, you may want to try the command
pip install --user pyspark
- If you’re using Ubuntu, you may want to try this command:
sudo apt install pyspark
- You can check out our in-depth guide on installing
- You can also check out this article to learn more about possible problems that may lead to an error when importing a library.
Understanding the “import” Statement
In Python, the
import statement serves two main purposes:
- Search the module by its name, load it, and initialize it.
- Define a name in the local namespace within the scope of the
importstatement. This local name is then used to reference the accessed module throughout the code.
What’s the Difference Between ImportError and ModuleNotFoundError?
What’s the difference between
Python defines an error hierarchy, so some error classes inherit from other error classes. In our case, the
ModuleNotFoundError is a subclass of the
You can see this in this screenshot from the docs:
You can also check this relationship using the
issubclass() built-in function:
>>> issubclass(ModuleNotFoundError, ImportError) True
Specifically, Python raises the
ModuleNotFoundError if the module (e.g.,
pyspark) cannot be found. If it can be found, there may be a problem loading the module or some specific files within the module. In those cases, Python would raise an
If an import statement cannot import a module, it raises an
ImportError. This may occur because of a faulty installation or an invalid path. In Python 3.6 or newer, this will usually raise a
The following video shows you how to resolve the
The following video shows you how to import a function from another folder—doing it the wrong way often results in the
How to Fix “ModuleNotFoundError: No module named ‘pyspark'” in PyCharm
If you create a new Python project in PyCharm and try to import the
pyspark library, it’ll raise the following error message:
Traceback (most recent call last): File "C:/Users/.../main.py", line 1, in <module> import pyspark ModuleNotFoundError: No module named 'pyspark' Process finished with exit code 1
The reason is that each PyCharm project, per default, creates a virtual environment in which you can install custom Python modules. But the virtual environment is initially empty—even if you’ve already installed
pyspark on your computer!
Here’s a screenshot exemplifying this for the
pandas library. It’ll look similar for
The fix is simple: Use the PyCharm installation tooltips to install Pandas in your virtual environment—two clicks and you’re good to go!
First, right-click on the
pandas text in your editor:
Second, click “
Show Context Actions” in your context menu. In the new menu that arises, click “Install Pandas” and wait for PyCharm to finish the installation.
The code will run after your installation completes successfully.
As an alternative, you can also open the
Terminal tool at the bottom and type:
$ pip install pyspark
If this doesn’t work, you may want to set the Python interpreter to another version using the following tutorial: https://www.jetbrains.com/help/pycharm/2016.1/configuring-python-interpreter-for-a-project.html
You can also manually install a new library such as
pyspark in PyCharm using the following procedure:
File > Settings > Projectfrom the PyCharm menu.
- Select your current project.
- Click the
Python Interpretertab within your project tab.
- Click the small
+symbol to add a new library to the project.
- Now type in the library to be installed, in your example Pandas, and click
- Wait for the installation to terminate and close all popup windows.
Here’s an analogous example:
Here’s a full guide on how to install a library on PyCharm.
While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.
To help students reach higher levels of Python success, he founded the programming education website Finxter.com that has taught exponential skills to millions of coders worldwide. He’s the author of the best-selling programming books Python One-Liners (NoStarch 2020), The Art of Clean Code (NoStarch 2022), and The Book of Dash (NoStarch 2022). Chris also coauthored the Coffee Break Python series of self-published books. He’s a computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.
His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.