How to Install Spark on PyCharm?

PySpark is a Python library providing an API for Apache Spark. The Spark framework is a distributed engine for set computations on large-scale data facilitating distributed data analytics and machine learning.

Problem Formulation: Given a PyCharm project. How to install the PySpark library in your project within a virtual environment or globally?

Here’s a solution that always works:

  • Open File > Settings > Project from the PyCharm menu.
  • Select your current project.
  • Click the Python Interpreter tab within your project tab.
  • Click the small + symbol to add a new library to the project.
  • Now type in the library to be installed, in your example "pyspark" without quotes, and click Install Package.
  • Wait for the installation to terminate and close all popup windows.

Here’s the installation process as a short animated video—it works analogously for PySpark, just type in “pyspark” in the search field instead:

Make sure to select only “pyspark” because there are many other packages that are not required but also contain the term “pyspark” (False positives):

pyspark on PyCharm installation

Alternatively, you can run the pip install pyspark command in your PyCharm “Terminal” view:

$ pip install pyspark
pip install pyspark on PyCharm

Feel free to check out the following free email academy with Python cheat sheets to boost your coding skills!

To become a PyCharm master, check out our full course on the Finxter Computer Science Academy available for free for all Finxter Premium Members: