Python shutil: High-Level File Operations Demystified

Graphical user interfaceDescription automatically generated

Are you looking to copy, move, delete, or archive data with your Python programs? If so, you’re in the right place because this article is all about the module that’s been specially designed for the job. It’s called shutil (short for shell utilities) and we’ll be demystifying its key features by way a few simple examples. We’ll also see how to use shutil in combination with some other standard library modules, and cover a few limitations that could cause you a bit of headache depending on your priorities, the operating system you use and your version of Python.

A Word About File Paths

Before we start, it’s worth mentioning that paths are constructed differently depending on your operating system. On Mac and Linux they’re separated by forward slashes (known as Posix style) and on Windows by backslashes.

For the purposes of this article I will be using Windows-style paths to illustrate shutil’s features, but this could just as easily have been done with Posix paths.

The fact that Windows paths use backslashes also leads to another complication because they have a special meaning in Python. They are used as part of special characters and for escaping purposes, which you can read all about in this Finxter backslash article.  

You will therefore notice the letter ‘r’ prior to strings in the code snippets – this prefix signifies a raw string in which backslashes are treated as literal rather than special characters. The other way to handle this issue is by using a second a backslash to escape the first, which is the format Python uses to display the Windows path of a new file that’s been created.

As an aside, when using paths in your real-world programs I would highly recommend defining them with pathlib.Path(). If done correctly, this has the effect of normalizing paths so they work regardless of the operating system the program is running on.

shutil Directory and File Operations

shutil copy

So, let’s kick things off with a simple example of how to copy a single file from one folder to another. 

There’s no need to pip install anything because shutil is in Python’s standard library; just import the module and you’re ready to go:

 >>> import shutil
 >>> source = r'C:\src_folder\blueprint.jpg'
 >>> destination = r'C:\dst_folder'
 >>> shutil.copy(source, destination)
 
 'C:\\dst_folder\\blueprint.jpg'

shutil.copy() places a duplicate of the specified source file in the destination folder you have defined, and Python confirms the path to the file. The file’s permissions are copied along with the data.Another option is to specify a destination file instead of a destination folder:

 ...
 >>> source = r'C:\src_folder\blueprint.jpg'
 >>> destination = r'C:\dst_folder\plan.jpg'
 >>> shutil.copy(source, destination)
 
 'C:\\dst_folder\\plan.jpg'

In this instance, a copy of the source file will still be placed in the destination folder but its name will be changed to the one that’s been provided.

WARNING: Regardless of whether you copy a file directly to a folder preserving its existing name or provide a destination file name, if a file already exists in the destination folder with that name copy() will permanently overwrite it without warning you first.

This could be useful if you’re intentionally looking to update or replace a file, but might cause major problems if you forget there’s another file in the location with that name that you want to keep!

shutil copy2

copy2() works in the same way as copy() except that in addition to file permissions it also attempts to preserve metadata such as the last time the file was modified. 

There are a few limitations to this, which you can read about in the Missing File Metadata section later in this article.

shutil copytree

If copying files one-by-one isn’t going to cut it, copytree()  is the way to go.

 ...
 >>> source = r'C:\src_folder\directory'
 >>> destination = r'C:\dst_folder\directory_copy'
 >>> shutil.copytree(source, destination)
 
 'C:\\dst_folder\\directory_copy'

copytree() creates a duplicate of the entire source directory and gives it the name you specify in the destination path. It uses copy2() to copy files by default so will attempt to preserve metadata, but this can be overridden by setting the copy_function parameter.Unlike when copying individual files, if a directory with the same name already exists in that destination (in this case directory_copy), an error will be raised and the directory tree will not be copied. So, when attempting to complete the same copytree operation for a second time this is an abridged version of what we see:

 ...
 FileExistsError: [WinError 183] Cannot create a file when that file already  
 exists: 'C:\\dst_folder\\directory_copy'

Accidentally overwriting an entire directory could be pretty catastrophic, and this safeguard has no doubt prevented many such incidents over the years. It’s also caused a fair amount of frustration though, because until very recently there was no straight forward way to override it. 

If replacing an existing directory IS what you want to do a new option was introduced in Python 3.8 that make this possible:

 ...
 >>> shutil.copytree(source, destination, dirs_exist_ok=True)
 
 'C:\\dst_folder\\directory_copy'

The dirs_exist_ok  parameter is set to False by default, but changing it to True overrides the usual behavior and allows us to complete our copytree()  operation for a second time even though directory_copy already exists in the specified location.Another handy feature is the ignore parameter:

 from shutil import copytree, ignore_patterns
 
 >>> src = r'C:\src_folder\another_directory'
 >>> dst = r'C:\dst_folder\another_directory_copy'
 >>> shutil.copytree(src, dst, ignore=ignore_patterns('*.txt', 'discard*'))
 
 'C:\\dst_folder\\another_directory_copy'

ignore allows you to specify files and folders to leave out when a directory is copied.

The simplest way to achieve this is by importing shutil’s ignore_patterns helper function, which can then be passed to copytree’s ignore parameter.

ignore_patterns takes one or more patterns in string format, and any files or folders matching them will be passed over when copytree() creates the new version of the directory. 

For example, in the above code snippet we have passed two arguments to ignore_patterns: '*.txt' and 'discard*'. The asterisk (* symbol) acts as a wildcard that matches zero or more characters, so these patterns will ensure that copytree() duplicates everything except files that end with .txt and files or folders that start with discard.This can be seen by viewing the file structure of another_directory:

 C:\src_folder>tree /F
 ...
 C:.
 └───another_directory
     ├───discard_this_folder
     ├───include_this_folder
     │       discard_this_file.docx
     │       include_this_file.docx
     │       include_this_file_too.docx
     │       this_file_will_be_discarded.txt
     │       this_file_will_not_be_discarded.pdf
     │
     └───include_this_folder_too

And then looking at the file structure of another_directory_copy once it’s been created by shutil:

C:\dst_folder>tree /F
 ...
 C:.
 └───another_directory_copy
     ├───include_this_folder
     │       include_this_file.docx
     │       include_this_file_too.docx
     │       this_file_will_not_be_discarded.pdf
     │
     └───include_this_folder_too

shutil move

move() works in a similar way to copy2()  but lets you transfer a file to another location instead of copying it. 

You can also move an entire directory by specifying a folder for it to be placed in:

 import shutil
 
 
 >>> source = r'C:\src_folder\diagrams'
 >>> destination = r'C:\dst_folder'
 >>> shutil.move(source, destination)
 
 'C:\\dst_folder\\diagrams'

Alternatively, you can provide a new name for the directory as part of the process:

 ...
 >>> source = r'C:\src_folder\diagrams'
 >>> destination = r'C:\dst_folder\layouts'
 >>> shutil.move(source, destination)
 
 'C:\\dst_folder\\layouts'

Unlike copy() and copy2(), move() will raise an exception if a file with the same name already exists in the given folder (unless it’s not on the current file system). This behavior can also be observed when moving directories. Having moved our diagrams directory and renamed it layouts, if we now try to move another directory called layouts into the same location we will see the following:

...
 >>> source = r'C:\src_folder\layouts'
 >>> destination = r'C:\dst_folder'
 >>> shutil.move(source, destination) 
 ...
 shutil.Error: Destination path 'C:\dst_folder\layouts' already exists
 

WARNING: However, as with the copy functions, when moving individual files, if you include a destination file name and a file with that name already exists in the destination folder, move() will permanently overwrite it without warning you first:

...
 >>> source = r'C:\src_folder\sketch.jpg'
 >>> destination = r'C:\dst_folder\design.jpg'
 >>> shutil.move(source, destination)
 
 'C:\\dst_folder\\design.jpg'
 
 >>> source = r'C:\src_folder\different_sketch.jpg'
 >>> destination = r'C:\dst_folder\design.jpg'
 >>> shutil.move(source, destination)
 
 'C:\\dst_folder\\design.jpg'


There is another subtle gotcha to look out for when using move() that has the potential to cause problems too:

...
 >>> source = r'C:\src_folder\blueprint.jpg'
 >>> destination = r'C:\dst_folder\plan'
 >>> shutil.move(source, destination)
 
 'C:\\dst_folder\\plan'

On this occasion we have tried to transfer a file into a folder that doesn’t exist. Instead of raising an exception, move() has completed the operation and given the file the name of the non-existent directory (plan) without a file extension. The file is still in JPEG format, but it won’t be called what we expect, and the file system will no longer recognize it! 

The same kind of problem could occur if we accidentally missed off the file extension from a destination file name as well.

This issue might also crop up when using the copy functions if you’re not careful. In that case you would at least have the original file for reference, but it could still lead to significant confusion.

shutil rmtree

If you want to delete an entire directory instead of moving or copying it, you can do this with rmtree():

 
 import shutil
 >>> shutil.rmtree(r'C:\dst_folder\directory_copy')

By default, rmtree() will raise an exception and halt the process if an error is encountered when attempting to remove files. You can see an example of one of these error messages below:

 ...
 PermissionError: [WinError 32] The process cannot access the file because 
 it is being used by another process: 
 'C:\\dst_folder\\directory_copy\\blueprint.pdf'


However, this behavior can be overridden:

 ...
 >>> shutil.rmtree(r'C:\dst_folder\directory_copy', ignore_errors=True)


If you set the ignore_errors  parameter to True, rmtree() will continue to delete the directory instead of raising an exception.

WARNING: Directory trees removed by rmtree() are permanently deleted, so you need to be very careful about how you use it. If you’re concerned by the potential risks (and I wouldn’t blame you if you were!), you might want to consider using a safer alternative such as Send2Trash.

shutil archive

You can use shutil to create directory archives as well:

 ...
 >>> shutil.make_archive(
         r'C:\dst_folder\zipped_designs', 
         'zip', 
         r'C:\src_folder\designs',
         )
 
 'C:\\dst_folder\\zipped_designs.zip'


As shown above, a simple way to do this is by passing three arguments to the make_archive()  function:

  1. The path where the new archive should be created, including its name but without the file extension.
  2. The archive format to use when creating it.
  3. The path of the directory to be archived.

The directory will remain unaltered in its original place, and the archive will be created in the specified location.

make_archive()  can also create archives in the .tar, .gztar, .bztar or .xztar formats.

For operations more sophisticated than archiving an entire directory, like zipping selected files from a directory based on filters, you can use the zipfile module instead.

shutil Limitations

You can achieve a great deal with the shutil module, but, as mentioned at the start of this article, it does have a few limitations that you should know about.

Missing File Metadata

copy2() preserves as much metadata as possible and is used by copytree()  and move() so by default these methods will do the same. It’s not able to capture everything though.

On Windows: file owners, access control lists (ACLs) and alternative data streams are not copied.

File owners and ACLs are also lost on Linux and Mac, along with groups.

On Mac OS the resource fork and other metadata are not used either, resulting in the loss of resource data and incorrect creator and file type codes.

Speed

A complaint often levelled at shutil in the past was that it could be very slow to use when working with large amounts of data, particularly on Windows.

Fortunately, this has been addressed in Python 3.8 with the introduction of the snappily titled platform-dependent efficient copy operations.

This “fast-copy” enhancement means that shutils copy and move operations are now optimized to occur within the relevant operating system kernel instead of Python’s userspace buffers whenever possible.

Therefore, if you’re running into speed issues on an earlier version of Python and using 3.8 instead is an option, it’s likely to improve matters greatly.

You could also look into third-party packages such as pyfastcopy.

 

Combining Shutil With Other Standard Library Modules

In the copytree()  section of this article we saw how to exert greater control over shutil’s behavior by using the ignore parameter to exclude files with a particular name or type.

But what if you want to carry out more complex tasks such as accessing other file-related data so you can check it to determine which operations should be completed? 

Using shutil in combination with some of Python’s other standard library modules is the answer. 

This section is intended to provide an example of one use case for this kind of approach.

We will create a simple program that can spring clean a file directory by storing away old subdirectories if they haven’t been modified for a long time.

To do this we’ll use shutil.move() along with several other handy modules including: pathlib (which I mentioned at the start), os and time.

The Modules

As well as making it much simpler to define cross platform compatible paths, pathlib’s Path class contains methods that really help with handling file paths efficiently

We’ll also be using the os module’s walk function, which has no equivalent in pathlib. This will enable us to traverse our subdirectories to identify all the files they contain and extract their paths.

We will take advantage of the time module too, so we can calculate how long it’s been since the files in each subdirectory where last modified.

Preparing for the Move

Having imported our modules:

 import os
 import pathlib
 import shutil
 import time


The first thing we need to do is assign the normal number of seconds in a year to a constant:

SECONDS = 365 * 24 * 60 * 60


This will help us to determine how long it’s been since the files in our subfolders were last modified (more on that later).

Next, we define our first function which will prepare the file operations that are necessary to complete the move:

 ...
 def prepare_move(number, path, storage_folder):
     pass


Our function takes three arguments:

  1. number – the number of years since any file in a subfolder was last modified (this could also be a float such as 1.5).
  2. path – the file path of the main directory that contains the subdirectories we want to tidy up.
  3. storage_folder – the name of the folder where we want the old directories to be placed. Once the operation is complete, this storage folder will be put in the main directory alongside the subdirectories that haven’t been moved.

We now need to assign some objects to variables that will play important roles in the preparation process:

 ...
 def prepare_move(number, path, storage_folder):
     length = SECONDS * number
     now = time.time()
     my_directory = pathlib.Path(path)
     my_subdirectories = (item for item in my_directory.iterdir() if item.is_dir())
  1. length –  is the result of multiplying the SECONDS constant we previously defined by the number of years passed into the function.
  2. now – is the current time in seconds provided by the time module. This is calculated based on what’s known as the epoch.
  3. my_directory –  stores the main directory path we passed to the function as a pathlib.Path object.
  4. my_subdirectories – is a generator containing the paths of our subdirectories produced by iterating through my_directory.

Our next step is to create a for loop to iterate through the subdirectories yielded by our generator and append the details of any that have not been modified during the period we specified to a list of file operations:

 ...
 def prepare_move(number, path, storage_folder):
     length = SECONDS * number
     now = time.time()
     my_directory = pathlib.Path(path)
     my_subdirectories = (item for item in my_directory.iterdir() if item.is_dir())
     file_operations = []
     for subdirectory in my_subdirectories:
         time_stats = _get_stats(subdirectory)


The first task carried out by the loop is to create a list of all the file modified times in a subdirectory. 

This is handled by a separate function which uses the os walk method mention earlier and the last modified value in seconds (st_mtime) available via the Path.stat() utility:

 ...
 def _get_stats(subdirectory):
     time_stats = []
     for folder, _, files in os.walk(subdirectory):
         for file in files:
             file_path = pathlib.Path (folder) / file
             time_stat = file_path.stat().st_mtime
             time_stats.append(time_stat)
     return time_stats

The loop then checks these file modified stats to see whether they all precede the specified point in time (with the calculation being done in seconds).

If so, the necessary source and destination paths are constructed and appended to the file_operations list.

Once the loop has iterated through all our subdirectories, the function returns the list of file operations that need to be completed:

 ...
 def prepare_move(number, path, storage_folder):
     length = SECONDS * number
     now = time.time()
     my_directory = pathlib.Path(path)
     my_subdirectories = (item for item in my_directory.iterdir() if item.is_dir())
     file_operations = []
     for subdirectory in my_subdirectories:
         time_stats = _get_stats(subdirectory)
         if all(time_stat < (now - length) for time_stat in time_stats):
             *_, subdirectory_name = subdirectory.parts
             source = subdirectory
             destination = my_directory / storage_folder / subdirectory_name
             file_operations.append((source, destination))
     return file_operations


Moving the Subdirectories

Now we need to define the function that will actually move the file:

 ...
 def move_files(file_operations):
     for operation in file_operations:
         source, destination = operation
         shutil.move(source, destination)


Because all the preparation work has already been done, this function simply accepts the file operations and passes them to shutil.move()  via a for loop so each old subdirectory can be placed in the specified storage_folder.

Executing the Program

Lastly, we define a main() function to execute the program and call it with our arguments:

 ...
 def main(number, path, storage_folder):
     file_operations = prepare_move(number, path, storage_folder)
     move_files(file_operations)
 
 main(1, r"F:\my_directory", "old_stuff")


Here’s the whole program:

 
 import os
 import pathlib
 import shutil
 import time
 
 
 SECONDS = 365 * 24 * 60 * 60
 
 
 def prepare_move(number, path, storage_folder):
     length = SECONDS * number
     now = time.time()
     my_directory = pathlib.Path(path)
     my_subdirectories = (item for item in my_directory.iterdir() if item.is_dir())
     file_operations = []
     for subdirectory in my_subdirectories:
         time_stats = _get_stats(subdirectory)
         if all(time_stat < (now - length) for time_stat in time_stats):
             *_, subdirectory_name = subdirectory.parts
             source = subdirectory
             destination = my_directory / storage_folder / subdirectory_name
             file_operations.append((source, destination))
     return file_operations
 
 
 def _get_stats(subdirectory):
     time_stats = []
     for folder, _, files in os.walk(subdirectory):
         for file in files:
             file_path = pathlib.Path (folder) / file
             time_stat = file_path.stat().st_mtime
             time_stats.append(time_stat)
     return time_stats
 
 
 def move_files(file_operations):
     for operation in file_operations:
         source, destination = operation
         shutil.move(source, destination)
 
 
 def main(number, path, storage_folder):
     file_operations = prepare_move(number, path, storage_folder)
     move_files(file_operations)
 
 main(1, r"F:\my_directory", "old_stuff")

You can see how the directory structure looked before running the program below:

 F:\my_directory>tree /F
 ...
 F:.
 ├───new_files_1
 │   │   new_file.jpg
 │   │
 │   ├───second_level_folder_1
 │   │       really_new_file.txt
 │   │
 │   └───second_level_folder_2
 │           very_new_file.txt
 │
 ├───new_files_2
 │       fairly_new_file.txt
 │
 ├───old_files_1
 │   │   old_file.txt
 │   │
 │   └───second_level_folder_1
 │       │   old_file_as_well.txt
 │       │
 │       └───third_level_folder
 │               really_old_file.jpg
 │
 └───old_files_2
     │   another_old_file.txt
     │
     └───old_second_level_folder
             oldest_file.jpg
             old_file_2.txt

And this is what it looks like afterwards:

 
 F:\my_directory>tree /F
 ...
 F:.
  ├───new_files_1
  │   │   new_file.jpg
  │   │
  │   ├───second_level_folder_1
  │   │       really_new_file.txt
  │   │
  │   └───second_level_folder_2
  │           very_new_file.txt
  │
  ├───new_files_2
  │       fairly_new_file.txt
  │
  └───old_stuff
      ├───old_files_1
      │   │   old_file.txt
      │   │
      │   └───second_level_folder_1
      │       │   old_file_as_well.txt
      │       │
      │       └───third_level_folder
      │               really_old_file.jpg
      │
      └───old_files_2
          │   another_old_file.txt
          │
          └───old_second_level_folder
                  oldest_file.jpg
                  old_file_2.txt 


Obviously, if you had a directory this small or one where all the subdirectories were labelled as either old or new already, you would be unlikely to need such a program! But hopefully this basic example helps to illustrate how the process would work with a larger, less intuitive directory.

The program shown in this section has been greatly simplified for demonstration purposes. If you would like to see a more complete version, structured as a command line application that summarizes changes before you decide whether to apply them, and enables you to tidy files based on creation and last accessed times as well, you can view it here.

Final Thoughts

As we’ve seen, the shutil module provides some excellent utilities for working with files and directories, and you can greatly enhance their power and precision by combining them with other tools from the standard library and beyond.

Care should be taken to avoid permanently overwriting or deleting existing files and directories by accident though, so please check out the warnings included in the relevant sections of this article if you haven’t already.

The example program described above is just one of many uses to which shutil’s tools could be put. Here’s hoping you find some ingenious ways to apply them in your own projects soon.