PySpark is a Python library providing an API for Apache Spark. The Spark framework is a distributed engine for set computations on large-scale data facilitating distributed data analytics and machine learning.
Here’s a solution that always works:
File > Settings > Projectfrom the PyCharm menu.
- Select your current project.
- Click the
Python Interpretertab within your project tab.
- Click the small
+symbol to add a new library to the project.
- Now type in the library to be installed, in your example
"pyspark"without quotes, and click
- Wait for the installation to terminate and close all popup windows.
Here’s the installation process as a short animated video—it works analogously for PySpark, just type in “pyspark” in the search field instead:
Make sure to select only “pyspark” because there are many other packages that are not required but also contain the term “pyspark” (False positives):
Alternatively, you can run the
pip install pyspark command in your PyCharm “Terminal” view:
$ pip install pyspark
Feel free to check out the following free email academy with Python cheat sheets to boost your coding skills!
While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.
To help students reach higher levels of Python success, he founded the programming education website Finxter.com. He’s author of the popular programming book Python One-Liners (NoStarch 2020), coauthor of the Coffee Break Python series of self-published books, computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.
His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.