Quick Fix: Python raises the
ImportError: No module named 'spark-sklearn' when it cannot find the library
spark-sklearn. The most frequent source of this error is that you haven’t installed
spark-sklearn explicitly with
pip install spark-sklearn. Alternatively, you may have different Python versions on your computer, and
spark-sklearn is not installed for the particular version you’re using.
In particular, you can try any of the following commands, depending on your concrete environment and installation needs:
💡 If you have only one version of Python installed:
pip install spark-sklearn💡 If you have Python 3 (and, possibly, other versions) installed:
pip3 install💡 If you don't have PIP or it doesn't work
python -m pip install💡 If you have Linux and you need to fix permissions (any one):
python3 -m pip install
sudo pip3 install💡 If you have Linux with apt
sudo apt install💡 If you have Windows and you have set up the
py -m pip install💡 If you have Anaconda
conda install -c anaconda💡 If you have Jupyter Notebook
You’ve just learned about the awesome capabilities of the
spark-sklearn library and you want to try it out, so you start your code with the following statement:
This is supposed to import the spark-sklearn library into your (virtual) environment. However, it only throws the following
ImportError: No module named spark-sklearn:
>>> import spark-sklearn Traceback (most recent call last): File "<pyshell#6>", line 1, in <module> import spark-sklearn ModuleNotFoundError: No module named 'spark-sklearn'
Solution Idea 1: Install Library spark-sklearn
The most likely reason is that Python doesn’t provide
spark-sklearn in its standard library. You need to install it first!
Before being able to import the
spark-sklearn module, you need to install it using Python’s package manager
pip. Make sure pip is installed on your machine.
To fix this error, you can run the following command in your Windows shell:
$ pip install spark-sklearn
This simple command installs
spark-sklearn in your virtual environment on Windows, Linux, and MacOS. It assumes that your
pip version is updated. If it isn’t, use the following two commands in your terminal, command line, or shell (there’s no harm in doing it anyways):
$ python -m pip install --upgrade pip $ pip install spark-sklearn
💡 Note: Don’t copy and paste the
$ symbol. This is just to illustrate that you run it in your shell/terminal/command line.
Solution Idea 2: Fix the Path
The error might persist even after you have installed the
spark-sklearn library. This likely happens because
pip is installed but doesn’t reside in the path you can use. Although
pip may be installed on your system the script is unable to locate it. Therefore, it is unable to install the library using
pip in the correct path.
To fix the problem with the path in Windows follow the steps given next.
Step 1: Open the folder where you installed Python by opening the command prompt and typing
Step 2: Once you have opened the
Python folder, browse and open the
Scripts folder and copy its location. Also verify that the folder contains the
Step 3: Now open the
Scripts directory in the command prompt using the
cd command and the location that you copied previously.
Step 4: Now install the library using
pip install spark-sklearn command. Here’s an analogous example:
After having followed the above steps, execute our script once again. And you should get the desired output.
Other Solution Ideas
ModuleNotFoundErrormay appear due to relative imports. You can learn everything about relative imports and how to create your own module in this article.
- You may have mixed up Python and pip versions on your machine. In this case, to install
spark-sklearnfor Python 3, you may want to try
python3 -m pip install spark-sklearnor even
pip3 install spark-sklearninstead of
pip install spark-sklearn
- If you face this issue server-side, you may want to try the command
pip install --user spark-sklearn
- If you’re using Ubuntu, you may want to try this command:
sudo apt install spark-sklearn
- You can also check out this article to learn more about possible problems that may lead to an error when importing a library.
Understanding the “import” Statement
In Python, the
import statement serves two main purposes:
- Search the module by its name, load it, and initialize it.
- Define a name in the local namespace within the scope of the
importstatement. This local name is then used to reference the accessed module throughout the code.
What’s the Difference Between ImportError and ModuleNotFoundError?
What’s the difference between
You can see this in this screenshot from the docs:
You can also check this relationship using the
issubclass() built-in function:
>>> issubclass(ModuleNotFoundError, ImportError) True
Specifically, Python raises the
ModuleNotFoundError if the module (e.g.,
spark-sklearn) cannot be found. If it can be found, there may be a problem loading the module or some specific files within the module. In those cases, Python would raise an
If an import statement cannot import a module, it raises an
ImportError. This may occur because of a faulty installation or an invalid path. In Python 3.6 or newer, this will usually raise a
The following video shows you how to resolve the
The following video shows you how to import a function from another folder—doing it the wrong way often results in the
How to Fix “ModuleNotFoundError: No module named ‘spark-sklearn'” in PyCharm
If you create a new Python project in PyCharm and try to import the
spark-sklearn library, it’ll raise the following error message:
Traceback (most recent call last): File "C:/Users/.../main.py", line 1, in <module> import spark-sklearn ModuleNotFoundError: No module named 'spark-sklearn' Process finished with exit code 1
The reason is that each PyCharm project, per default, creates a virtual environment in which you can install custom Python modules. But the virtual environment is initially empty—even if you’ve already installed
spark-sklearn on your computer!
Here’s a screenshot exemplifying this for the
pandas library. It’ll look similar for
The fix is simple: Use the PyCharm installation tooltips to install Pandas in your virtual environment—two clicks and you’re good to go!
First, right-click on the
pandas text in your editor:
Second, click “
Show Context Actions” in your context menu. In the new menu that arises, click “Install Pandas” and wait for PyCharm to finish the installation.
The code will run after your installation completes successfully.
As an alternative, you can also open the
Terminal tool at the bottom and type:
$ pip install spark-sklearn
If this doesn’t work, you may want to set the Python interpreter to another version using the following tutorial: https://www.jetbrains.com/help/pycharm/2016.1/configuring-python-interpreter-for-a-project.html
You can also manually install a new library such as
spark-sklearn in PyCharm using the following procedure:
File > Settings > Projectfrom the PyCharm menu.
- Select your current project.
- Click the
Python Interpretertab within your project tab.
- Click the small
+symbol to add a new library to the project.
- Now type in the library to be installed, in your example Pandas, and click
- Wait for the installation to terminate and close all popup windows.
Here’s an analogous example:
Here’s a full guide on how to install a library on PyCharm.
Jean is a tech enthusiast with a love for AI and machine learning innovations, particularly LLMs. Beyond contributing insightful articles to our blog, Jean has worked as a Python, Rust, and Go coder for one of the leading tech firms in the world.