When working with Python, a programmer often encounters situations where she needs to install packages not contained in the Standard Library. In such situations, she must install modules from online repositories using packager installers.
The goal of this article is to help beginners develop a working knowledge of pip
(acronym for “PIP Installs Packages”) as quickly as possible while defining all the prerequisite jargon along the way. In particular, this article aims to make the content of the pip documentation as accessible as possible for beginners by using easier words and emphasizing practical examples.
As you go through the article, feel free to watch my detailed explainer video:
What is pip?
PIP (“Pip Installs Packages”) is the official package managing software for Python which installs packages from PyPI (“Python Package Index”). PyPI contains over 300,000 packages as of November 2021 which is much larger than similar package repositories for Python. PIP allows users to install and uninstall packages, manage dependencies, keep archives of wheel files, amongst many other things.
The purpose of this article is to develop a “working knowledge” of PIP that may become useful while working on Python projects at a basic to intermediate level. In particular, we will talk about the most useful parts of the PIP documentation and provide explanations so to make the system more accessible to the beginner. The article will assume that the user is working on MacOS, but the commands for Windows can be obtained through minor modifications.
Note on Pip vs. Conda
A popular alternative to PIP is Conda, which is a package managing software aimed for data analysis. We will highlight three key differences to give you a sense of which you may prefer to use. For a more extensive discussion, see the official Anaconda blog page or StackOverflow.
1) Virtual Environments. Virtual environments are isolated Python environments used for Python projects. Because many Python projects depend on having specific versions of packages installed in the environment, projects may be broken when globally installed packages are updated. To prevent this, virtual environments are created so that projects can be run in the same environment, every time they need to be implemented.
PIP has several virtual environment builders such as virtualenv
and venv
. (See Chris’s article for a more detailed discussion.) In contrast, Conda has a built-in virtual environment manager. (This can be managed through a GUI if you install Anaconda Navigator.) In this respect, Conda may be easier to use for beginning coders.
2) Availability of Packages. As noted before, PyPI boasts over 300,000 packages in contrast to around 7000 packages in the Anaconda repository. Although PyPI packages can be installed through Conda, they often lead to complications, and mixing the two should generally be avoided. (For more details, see the official Anaconda blog page). Many popular Python packages (numpy
, matplotlib
, and pandas
to name a few) are available through Conda, but when working on Python projects, it is not uncommon for developers to come across packages that are only available through PyPI.
3) Languages. While PIP only deals with Python packages, Conda can install packages written in other languages such as R or C. This is because Conda is aimed toward data science tasks.
Part I: How to install packages using pip?
In this section, we will look at how to install packages and manage dependencies using pip.
pip install
To install packages on pip from PyPI, open up your terminal and use the command:
pip install matplotlib
π Note: pip
is replaced with python -m pip
in the PIP documentations. -m
flag searches the sys.path
for the pip
module and executes it as an executable file. Some systems require that you use python -m pip
. For this article, we will just use pip
.
The install command installs all of the package’s dependencies, which is to say it installs all the necessary packages for the desired package to install properly. For instance, matplotlib
requires numpy
, packaging
, pyparsing
, cycler
, amongst many others whereas NumPy has none. Dependency resolution is a major topic in using pip
.
There are various other sources from which you can install packages.
Requirement Files. Requirement files are .txt
files that allow users to install packages in bulk, possibly with specifications such as package versions. (See the “Example” in the PyPI documentation to get a sense of what the contents of the file should look like.) Many of the pip
commands have options that make outputs suitable for requirement files.
You can use the pip install
command to install from requirement files. To do this, navigate to the appropriate directory on the terminal (using the terminal command cd
). Then use the following PIP command:
pip install -r requirements.txt
Instead of navigating to the directory on terminal, you could use the absolute path of the file:
pip3 install -r /Users/username/Desktop/requirements.txt
VCS Projects. Many Python packages are available through VCS repositories (such as GitHub) as well. The following example is if you wanted to install Django from their GitHub repository:
pip install git+https://github.com/django/django.git#egg=django
Wheel and Tarball File. The pip install command can also be used to install from local wheel (.whl
) and tarball (.tar.gz
) files. (Read this Medium article and StackOverflow post on their differences.)
The syntax is similar to before. Navigate to the directory where the files are located using the change directory (cd
) command on terminal. For example, to install the tea
package from a whl
file, use:
pip install tea-0.1.6-py3-none-any.whl
To install the tea
package using tarball
, use:
pip install tea-0.1.6.tar
pip uninstall
The uninstall command is fairly self-explanatory. It allows users to uninstall packages. For instance, if you were to uninstall the tea package using pip
, then use:
pip uninstall -y tea
You can (optionally) add -y
as above to prevent the program from asking for confirmation.
To uninstall multiple packages at once, you can list the packages in a requirements.txt
file (much like we did for pip install
), and use the following command:
pip uninstall -r requirements.txt
pip check
The check command allows users to check for any broken dependencies, i.e. if there are any packages that depend on other packages that are not installed in the environment. The syntax is as follows:
pip check
pip show
The show command lists all the relevant information for a particular package. For instance, if you want to know where Django is installed on your device or if you want to know its package dependencies, you can use:
pip show django
For which you can get the output:
Name: Django Version: 3.0 Summary: A high-level Python Web framework that encourages rapid development and clean, pragmatic design. Home-page: https://www.djangoproject.com/ Author: Django Software Foundation Author-email: foundation@djangoproject.com License: BSD Location: /Users/user_name/Library/Python/3.8/lib/python/site-packages Requires: pytz, sqlparse Required-by:
pip list
To list all the packages available in your environment, use the pip list
command:
pip list
For which you may get the output:
Package Version -------------- ------- pip 19.2.3 setuptools 41.2.0 setuptools-scm 6.3.2 six 1.15.0 sqlparse 0.4.2 tea 0.1.6 tomli 1.2.2 tzlocal 3.0 wheel 0.33.1
What if the user wanted to uninstall all the packages except the bare essentials? You can obtain a list of packages that are not dependencies of installed packages using:
pip3 list --format freeze --not-required
The option “--format freeze
” puts the list in a format compatible with a requirements.txt
file:
pip==19.2.3 setuptools-scm==6.3.2 six==1.15.0 sqlparse==0.4.2 tea==0.1.6 wheel==0.33.1
Now the user can copy the above into a requirements.txt
file, delete the names of files that the user wants to keep, and use
pip uninstall -r requirements.txt
to uninstall all the rest.
pip freeze
The freeze command outputs a list of packages installed in the environment in a package suitable for requirement files. The syntax is as follows:
pip freeze
The freeze
command is useful for copying all the packages from environment A to environment B. First run freeze
in environment A, and store the contents in a requirements.txt
file:
pip freeze > requirements.txt
The file gets stored in the current directory (which you can check using pwd
command on terminal). Then go to environment B. (If A and B are virtual environments, deactivate A and activate B on terminal using commands from whichever virtual environment manager is being used.) Then install the packages in the requirements file using install:
pip install -r requirements.txt
Part II: Distribution Files
In this section, we will discuss how to download and manage distribution files for Python packages.
Distribution files are compressed files containing various files necessary to implement the Python library. See this medium article on an extensive discussion on the different types of distribution files. We just need to know the following in order to understand the rest of this section:
Wheels. (.whl
) Wheel files are essentially zip files containing everything necessary to install packages in your local environment. They are generally faster to download and install compared to tarballs. For more details, see this article from RealPython.org and this article from PythonWheels.com.
A “built” distribution file is in a format that is ready to install, thereby making the whole installation process faster.
Tarballs. (.tar.gz
) Tarballs are types of source distributions that contain both python codes and codes for any extension modules for the package.
Wheel files are the preferred format for installations using pip. See this stackoverflow post on a discussion on wheels versus tarballs.
pip download
Like the pip install
command, the pip download
command downloads the necessary distribution files from repositories (e.g. for offline installation), but does not install the packages from the downloaded files. As such, the command supports many of the options that the install command does.
For instance, if you were to download the distribution file, we would use the following syntax:
pip download numpy
pip wheel
The wheel
command allows users to build wheel
files. Since the command outputs wheel files, its behavior is very similar to the download
command. The main difference between the two is that the wheel
command is intended for building wheel files whereas the download command is for downloading them from the web. See this stackoverflow discussion on their differences.
To build a wheel file for the standalone module, use:
pip wheel standalone
Much like the install
and download
commands, wheel
also supports requirement files:
pip wheel -r requirements.txt
pip cache
pip
has a built-in cache system for keeping distribution files downloaded from repositories. Whenever pip
is used to install a package, the wheel
files in the cache are preferred over downloading new distribution files from the repository. This helps the whole installation process faster as well as reduces traffic to repositories.
The pip cache
command allows users to interact with pip’s wheel cache. There are several things you can do with it:
Show file path to the directory of all cache files:
pip cache dir
Show various information regarding the cache, such as the number of files and size of the cache:
pip cache info
List the file names in a pip cache:
pip cache list
To see a list of file paths for wheel files of specific packages, use:
pip cache list numpy --format==abspath
To remove specific packages from the cache, use:
pip cache remove numpy
Finally, to clear the whole cache:
pip cache purge
pip hash
A hash value is a value assigned to a file that changes if the file is altered. Since anyone can upload packages to pypa
, there may be tampered packages in the repository, at least in principle. Hash values allow users to check whether files have been tampered with or not.
To generate a hash value for a wheel
file, use:
python -m pip hash tea-0.1.7-py3-none-any.whl
There are different algorithms for computing hash values. On pip
, you can choose from sha256
, sha384
, and sha512
:
python -m pip hash -a 'sha256' tea-0.1.7-py3-none-any.whl
Running this, the output is:
--hash=sha256:f0a49f55419338730cdc100424b43e902e29a724ce198f6fd1026e6a96e33799
We can compare this to the hash code available on PyPI to confirm that it is indeed the correct file.
Miscellaneous
Here are some other commands listed in the pip documentation.
pip config
The config
command allows users to interact with the configuration file (pip.con
f) from terminal. The configuration files are located in standardized locations depending on the platform (see “Location” in the documentation), and most of what can be done by the config command can be done by opening the configuration file in a text editor and editing its contents. An easy way to open the configuration file is to use the following terminal commands:
locate pip.conf
This will print out the locations for the pip.conf
file on your system. If you wanted to open the global configuration file, then you can use:
open /Library/Application\ Support/pip/pip.conf
(Notice that the space character has been escaped. Otherwise, the terminal will return an error.)
Alternatively, you can use the edit
subcommand:
pip config --user edit
(For this to work, the $EDITOR
environment variable needs to be set to the executable file of your favorite plain text editor. See this stackoverflow post for how to do this.)
Configuration File. The configuration files determine the default behavior of pip
commands. There are three levels to configuration files. The global files determine pip
‘s behavior throughout the system, the user files determine the behavior for the user, and finally, the site file determines the behavior depending on the virtual environment.
Let’s look at what the contents of a configuration file should look like. If you wanted the output of the list command to be in freeze format, then you can put the following in the user configuration file:
[list] format = freeze
There are several ways of viewing the content of config files using pip
. If you want to see the contents of the user config file, use the following command:
pip config --user list
In the case of the configuration file we defined above, we will see the following output:
list.format = freeze
When using the config
command, command behavior is assigned using variables given in the form “comma
nd.option”. (This is what is meant by “name
” in the pip documentation.)
If you wanted to see the contents of all of the configuration files at once (along with other information concerning the configuration files), you can use debug command:
pip config debug
You can display, set, and delete individual variables from the terminal as well. To display the contents of the variable, use the get
subcommand:
pip config --user get list.format
To delete the value for a variable (e.g. reset list.format
to its default value), then use the unset
subcommand:
pip config --user unset list.format
If you want to set a value to the variable (e.g. you want to set the format back to freeze
), use the set
subcommand:
pip config --user set list.format freeze
pip debug
The debug
command outputs information about the system that may be useful for debugging, such as the versions for pip
and python
, where the executable is located etc:
pip debug
pip search
The pip search
command allowed users to search for PyPI packages using a query. However, the command has been permanently disabled as of March 2021.
Conclusion
Finally, note that much of the content in the documentation and this blog article is available through the pip help
command. For instance, if the user forgets the syntax for config
, then use:
pip help config
This command provides the syntax for the config command as well as all the possible options associated with the command.