<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jonathan Boland, Author at Be on the Right Side of Change</title>
	<atom:link href="https://blog.finxter.com/author/jonathan/feed/" rel="self" type="application/rss+xml" />
	<link>https://blog.finxter.com/author/jonathan/</link>
	<description></description>
	<lastBuildDate>Wed, 30 Sep 2020 17:52:02 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://blog.finxter.com/wp-content/uploads/2020/08/cropped-cropped-finxter_nobackground-32x32.png</url>
	<title>Jonathan Boland, Author at Be on the Right Side of Change</title>
	<link>https://blog.finxter.com/author/jonathan/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Python shutil: High-Level File Operations Demystified</title>
		<link>https://blog.finxter.com/python-shutil-high-level-file-operations-demystified/</link>
		
		<dc:creator><![CDATA[Jonathan Boland]]></dc:creator>
		<pubDate>Wed, 30 Sep 2020 17:49:47 +0000</pubDate>
				<category><![CDATA[Python]]></category>
		<guid isPermaLink="false">https://blog.finxter.com/?p=14010</guid>

					<description><![CDATA[<p>Are you looking to copy, move, delete, or archive data with your Python programs? If so, you’re in the right place because this article is all about the module that’s been specially designed for the job. It’s called shutil (short for shell utilities) and we’ll be demystifying its key features by way a few simple ... <a title="Python shutil: High-Level File Operations Demystified" class="read-more" href="https://blog.finxter.com/python-shutil-high-level-file-operations-demystified/" aria-label="Read more about Python shutil: High-Level File Operations Demystified">Read more</a></p>
<p>The post <a href="https://blog.finxter.com/python-shutil-high-level-file-operations-demystified/">Python shutil: High-Level File Operations Demystified</a> appeared first on <a href="https://blog.finxter.com">Be on the Right Side of Change</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p><strong><img fetchpriority="high" decoding="async" alt="Graphical user interface

Description automatically generated" src="https://lh6.googleusercontent.com/ruqYV-vYczythmZHS_rXH-Sa9cSXvK3xFdJd3YuhBsruGBOPcPMH7Ue3mbtRD7ZaWCimo2gIpIOZRZEz2diMXjWLvDz19sWE5xboV9u5iAlyHTIWIlkZ-h6ubPx4MMUz0iuZMk5f" width="680" height="237"></strong></p>



<p>Are you looking to copy, move, delete, or archive data with your Python programs? If so, you’re in the right place because this article is all about the module that’s been specially designed for the job. It’s called shutil (short for shell utilities) and we’ll be demystifying its key features by way a few simple examples. We’ll also see how to use shutil in combination with some other standard library modules, and cover a few limitations that could cause you a bit of headache depending on your priorities, the operating system you use and your version of Python.</p>



<h2 class="wp-block-heading">A Word About File Paths</h2>



<p>Before we start, it’s worth mentioning that paths are constructed differently depending on your operating system. On Mac and Linux they’re separated by forward slashes (known as Posix style) and on Windows by backslashes.</p>



<p>For the purposes of this article I will be using Windows-style paths to illustrate shutil’s features, but this could just as easily have been done with Posix paths.</p>



<p>The fact that Windows paths use backslashes also leads to another complication because they have a special meaning in Python. They are used as part of special characters and for escaping purposes, which you can read all about in this <a href="https://blog.finxter.com/how-to-do-a-backslash-in-python/">Finxter backslash article</a>.&nbsp;&nbsp;</p>



<p>You will therefore notice the letter ‘r’ prior to strings in the code snippets – this prefix signifies a raw string in which backslashes are treated as literal rather than special characters. The other way to handle this issue is by using a second a backslash to escape the first, which is the format Python uses to display the Windows path of a new file that’s been created.</p>



<p>As an aside, when using paths in your real-world programs I would highly recommend defining them with <a href="https://treyhunner.com/2018/12/why-you-should-be-using-pathlib/">pathlib.Path()</a>. If done correctly, this has the effect of normalizing paths so they work regardless of the operating system the program is running on.</p>



<h2 class="wp-block-heading">shutil Directory and File Operations</h2>



<h3 class="wp-block-heading"><em>shutil copy</em></h3>



<p>So, let’s kick things off with a simple example of how to copy a single file from one folder to another.&nbsp;</p>



<p>There’s no need to pip install anything because shutil is in Python’s standard library; just import the module and you’re ready to go:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> >>> import shutil
 >>> source = r'C:\src_folder\blueprint.jpg'
 >>> destination = r'C:\dst_folder'
 >>> shutil.copy(source, destination)
 
 'C:\\dst_folder\\blueprint.jpg'</pre>



<p><code>shutil.copy()</code> places a duplicate of the specified source file in the destination folder you have defined, and Python confirms the path to the file. The file’s permissions are copied along with the data.Another option is to specify a destination <em>file</em> instead of a destination <em>folder</em>:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> ...
 >>> source = r'C:\src_folder\blueprint.jpg'
 >>> destination = r'C:\dst_folder\plan.jpg'
 >>> shutil.copy(source, destination)
 
 'C:\\dst_folder\\plan.jpg'</pre>



<p>In this instance, a copy of the source file will still be placed in the destination folder but its name will be changed to the one that’s been provided.</p>



<p><strong>WARNING:</strong> Regardless of whether you copy a file directly to a folder preserving its existing name or provide a destination file name, if a file already exists in the destination folder with that name <code>copy()</code> will <strong>permanently overwrite it</strong> <strong>without warning you first</strong>.</p>



<p>This could be useful if you’re intentionally looking to update or replace a file, but might cause major problems if you forget there’s another file in the location with that name that you want to keep!</p>



<h3 class="wp-block-heading"><em>shutil copy2</em></h3>



<p><code>copy2()</code> works in the same way as <code>copy()</code> except that in addition to file permissions it also attempts to preserve metadata such as the last time the file was modified.&nbsp;</p>



<p>There are a few limitations to this, which you can read about in the <em>Missing File Metadata</em> section later in this article.</p>



<h3 class="wp-block-heading"><em>shutil copytree</em></h3>



<p>If copying files one-by-one isn’t going to cut it, <code>copytree()</code>&nbsp; is the way to go.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> ...
 >>> source = r'C:\src_folder\directory'
 >>> destination = r'C:\dst_folder\directory_copy'
 >>> shutil.copytree(source, destination)
 
 'C:\\dst_folder\\directory_copy'
</pre>



<p><code>copytree()</code> creates a duplicate of the entire source directory and gives it the name you specify in the destination path. It uses <code>copy2()</code> to copy files by default so will attempt to preserve metadata, but this can be overridden by setting the copy_function parameter.Unlike when copying individual files, if a directory with the same name already exists in that destination (in this case <code>directory_copy</code>), an error will be raised and the directory tree will not be copied. So, when attempting to complete the same copytree operation for a second time this is an abridged version of what we see:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> ...
 FileExistsError: [WinError 183] Cannot create a file when that file already  
 exists: 'C:\\dst_folder\\directory_copy'</pre>



<p>Accidentally overwriting an entire directory could be pretty catastrophic, and this safeguard has no doubt prevented many such incidents over the years. It’s also caused a fair amount of frustration though, because until very recently there was no straight forward way to override it.&nbsp;</p>



<p>If replacing an existing directory IS what you want to do a new option was introduced in Python 3.8 that make this possible:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> ...
 >>> shutil.copytree(source, destination, dirs_exist_ok=True)
 
 'C:\\dst_folder\\directory_copy'</pre>



<p>The <code>dirs_exist_ok</code>&nbsp; parameter is set to False by default, but changing it to True overrides the usual behavior and allows us to complete our <code>copytree()</code>&nbsp; operation for a second time even though <code>directory_copy</code> already exists in the specified location.Another handy feature is the ignore parameter:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> from shutil import copytree, ignore_patterns
 
 >>> src = r'C:\src_folder\another_directory'
 >>> dst = r'C:\dst_folder\another_directory_copy'
 >>> shutil.copytree(src, dst, ignore=ignore_patterns('*.txt', 'discard*'))
 
 'C:\\dst_folder\\another_directory_copy'
</pre>



<p><code>ignore</code> allows you to specify files and folders to leave out when a directory is copied.</p>



<p>The simplest way to achieve this is by importing shutil’s <code>ignore_patterns</code> helper function, which can then be passed to copytree’s ignore parameter.</p>



<p><code>ignore_patterns</code> takes one or more patterns in string format, and any files or folders matching them will be passed over when <code>copytree()</code> creates the new version of the directory.&nbsp;</p>



<p>For example, in the above code snippet we have passed two arguments to ignore_patterns: <code>'*.txt'</code> and <code>'discard*'</code>. The <a href="https://blog.finxter.com/what-is-asterisk-in-python/" title="What is the Asterisk / Star Operator (*) in Python?" target="_blank" rel="noreferrer noopener">asterisk </a>(* symbol) acts as a wildcard that matches zero or more characters, so these patterns will ensure that <code>copytree()</code> duplicates everything except files that end with .txt and files or folders that start with discard.This can be seen by viewing the file structure of <code>another_directory</code>:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> C:\src_folder>tree /F
 ...
 C:.
 └───another_directory
     ├───discard_this_folder
     ├───include_this_folder
     │       discard_this_file.docx
     │       include_this_file.docx
     │       include_this_file_too.docx
     │       this_file_will_be_discarded.txt
     │       this_file_will_not_be_discarded.pdf
     │
     └───include_this_folder_too
</pre>



<p>And then looking at the file structure of another_directory_copy once it’s been created by shutil:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">C:\dst_folder>tree /F
 ...
 C:.
 └───another_directory_copy
     ├───include_this_folder
     │       include_this_file.docx
     │       include_this_file_too.docx
     │       this_file_will_not_be_discarded.pdf
     │
     └───include_this_folder_too
</pre>



<h3 class="wp-block-heading"><em>shutil move</em></h3>



<p>move() works in a similar way to <code>copy2()</code>&nbsp; but lets you transfer a file to another location instead of copying it.&nbsp;</p>



<p>You can also move an entire directory by specifying a folder for it to be placed in:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> import shutil
 
 
 >>> source = r'C:\src_folder\diagrams'
 >>> destination = r'C:\dst_folder'
 >>> shutil.move(source, destination)
 
 'C:\\dst_folder\\diagrams'</pre>



<p>Alternatively, you can provide a new name for the directory as part of the process:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> ...
 >>> source = r'C:\src_folder\diagrams'
 >>> destination = r'C:\dst_folder\layouts'
 >>> shutil.move(source, destination)
 
 'C:\\dst_folder\\layouts'</pre>



<p>Unlike <code>copy()</code> and <code>copy2()</code>, <code>move()</code> will raise an exception if a file with the same name already exists in the given folder (unless it’s not on the current file system). This behavior can also be observed when moving directories. Having moved our diagrams directory and renamed it layouts, if we now try to move another directory called layouts into the same location we will see the following:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">...
 >>> source = r'C:\src_folder\layouts'
 >>> destination = r'C:\dst_folder'
 >>> shutil.move(source, destination) 
 ...
 shutil.Error: Destination path 'C:\dst_folder\layouts' already exists
 
</pre>



<p>WARNING: However, as with the copy functions, when moving individual files, if you include a destination file name and a file with that name already exists in the destination folder, <code>move()</code> will <strong>permanently overwrite it without warning you first</strong>:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">...
 >>> source = r'C:\src_folder\sketch.jpg'
 >>> destination = r'C:\dst_folder\design.jpg'
 >>> shutil.move(source, destination)
 
 'C:\\dst_folder\\design.jpg'
 
 >>> source = r'C:\src_folder\different_sketch.jpg'
 >>> destination = r'C:\dst_folder\design.jpg'
 >>> shutil.move(source, destination)
 
 'C:\\dst_folder\\design.jpg'</pre>



<p><br>There is another subtle gotcha to look out for when using move() that has the potential to cause problems too:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">...
 >>> source = r'C:\src_folder\blueprint.jpg'
 >>> destination = r'C:\dst_folder\plan'
 >>> shutil.move(source, destination)
 
 'C:\\dst_folder\\plan'</pre>



<p>On this occasion we have tried to transfer a file into a folder that doesn’t exist. Instead of raising an exception, <code>move()</code> has completed the operation and given the file the name of the non-existent directory (plan) <strong>without a file extension</strong>. The file is still in JPEG format, but it won’t be called what we expect, and the file system will no longer recognize it! </p>



<p>The same kind of problem could occur if we accidentally missed off the file extension from a destination file name as well.</p>



<p>This issue might also crop up when using the copy functions if you’re not careful. In that case you would at least have the original file for reference, but it could still lead to significant confusion.</p>



<h3 class="wp-block-heading"><em>shutil rmtree</em></h3>



<p>If you want to delete an entire directory instead of moving or copying it, you can do this with <code>rmtree()</code>:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> 
 import shutil
 >>> shutil.rmtree(r'C:\dst_folder\directory_copy')
</pre>



<p>By default, <code>rmtree()</code> will raise an exception and halt the process if an error is encountered when attempting to remove files. You can see an example of one of these error messages below:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> ...
 PermissionError: [WinError 32] The process cannot access the file because 
 it is being used by another process: 
 'C:\\dst_folder\\directory_copy\\blueprint.pdf'
</pre>



<p><br>However, this behavior can be overridden:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> ...
 >>> shutil.rmtree(r'C:\dst_folder\directory_copy', ignore_errors=True)</pre>



<p><br>If you set the ignore_errors&nbsp; parameter to True, rmtree() will continue to delete the directory instead of raising an exception.</p>



<p><strong>WARNING:</strong> Directory trees removed by rmtree() are permanently deleted, so you need to be very careful about how you use it. If you’re concerned by the potential risks (and I wouldn’t blame you if you were!), you might want to consider using a safer alternative such as <a href="https://pypi.org/project/Send2Trash/">Send2Trash</a>.</p>



<h3 class="wp-block-heading"><em>shutil archive</em></h3>



<p>You can use shutil to create directory archives as well:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> ...
 >>> shutil.make_archive(
         r'C:\dst_folder\zipped_designs', 
         'zip', 
         r'C:\src_folder\designs',
         )
 
 'C:\\dst_folder\\zipped_designs.zip'</pre>



<p><br>As shown above, a simple way to do this is by passing three arguments to the make_archive() &nbsp;function:</p>



<ol class="wp-block-list"><li>The path where the new archive should be created, including its name but <em>without</em> the file extension.</li><li>The archive format to use when creating it.</li><li>The path of the directory to be archived.</li></ol>



<p>The directory will remain unaltered in its original place, and the archive will be created in the specified location.</p>



<p>make_archive() &nbsp;can also create archives in the .tar, .gztar, .bztar or .xztar formats.</p>



<p>For operations more sophisticated than archiving an entire directory, like zipping selected files from a directory based on filters, you can use the <a href="https://thispointer.com/python-how-to-create-a-zip-archive-from-multiple-files-or-directory/">zipfile module</a> instead.</p>



<h2 class="wp-block-heading">shutil Limitations</h2>



<p>You can achieve a great deal with the shutil module, but, as mentioned at the start of this article, it does have a few limitations that you should know about.</p>



<h3 class="wp-block-heading"><em>Missing File Metadata</em></h3>



<p>copy2() preserves as much metadata as possible and is used by copytree()&nbsp; and move() so by default these methods will do the same. It’s not able to capture everything though.</p>



<p>On Windows: file owners, access control lists (ACLs) and alternative data streams are not copied.</p>



<p>File owners and ACLs are also lost on Linux and Mac, along with groups.</p>



<p>On Mac OS the resource fork and other metadata are not used either, resulting in the loss of resource data and incorrect creator and file type codes.</p>



<h3 class="wp-block-heading"><em>Speed</em></h3>



<p>A complaint often levelled at shutil in the past was that it could be very slow to use when working with large amounts of data, particularly on Windows.</p>



<p>Fortunately, this has been addressed in Python 3.8 with the introduction of the snappily titled platform-dependent efficient copy operations.</p>



<p>This “fast-copy” enhancement means that shutils copy and move operations are now optimized to occur within the relevant <a href="https://en.wikipedia.org/wiki/Kernel_(operating_system)">operating system kernel</a> instead of Python’s userspace buffers whenever possible.</p>



<p>Therefore, if you’re running into speed issues on an earlier version of Python and using 3.8 instead is an option, it’s likely to improve matters greatly.</p>



<p>You could also look into third-party packages such as <a href="https://pypi.org/project/pyfastcopy/">pyfastcopy</a>.</p>



<p> </p>



<h2 class="wp-block-heading">Combining Shutil With Other Standard Library Modules</h2>



<p>In the copytree() &nbsp;section of this article we saw how to exert greater control over shutil’s behavior by using the ignore parameter to exclude files with a particular name or type.</p>



<p>But what if you want to carry out more complex tasks such as accessing other file-related data so you can check it to determine which operations should be completed?&nbsp;</p>



<p>Using shutil in combination with some of Python’s other standard library modules is the answer.&nbsp;</p>



<p>This section is intended to provide an example of one use case for this kind of approach.</p>



<p>We will create a simple program that can spring clean a file directory by storing away old subdirectories if they haven’t been modified for a long time.</p>



<p>To do this we’ll use shutil.move() along with several other handy modules including: pathlib (which I mentioned at the start), os and time.</p>



<h3 class="wp-block-heading"><em>The Modules</em></h3>



<p>As well as making it much simpler to define cross platform compatible paths, pathlib’s Path class contains methods that really help with <a href="https://realpython.com/python-pathlib/">handling file paths efficiently</a>.&nbsp;</p>



<p>We’ll also be using the <a href="https://www.tutorialspoint.com/python/os_walk.htm">os module’s walk function</a>, which has no equivalent in pathlib. This will enable us to traverse our subdirectories to identify all the files they contain and extract their paths.</p>



<p>We will take advantage of the time module too, so we can calculate how long it’s been since the files in each subdirectory where last modified.</p>



<h3 class="wp-block-heading"><em>Preparing for the Move</em></h3>



<p>Having imported our modules:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> import os
 import pathlib
 import shutil
 import time</pre>



<p><br>The first thing we need to do is assign the normal number of seconds in a year to a constant:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">SECONDS = 365 * 24 * 60 * 60</pre>



<p><br>This will help us to determine how long it’s been since the files in our subfolders were last modified (more on that later).</p>



<p>Next, we define our first function which will prepare the file operations that are necessary to complete the move:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> ...
 def prepare_move(number, path, storage_folder):
     pass</pre>



<p><br>Our function takes three arguments:</p>



<ol class="wp-block-list"><li>number – the number of years since any file in a subfolder was last modified (this could also be a <a href="https://blog.finxter.com/wp-content/uploads/2019/02/CheatSheet-Python-2_-Data-Structures.docx.pdf">float</a> such as 1.5).</li><li>path – the file path of the main directory that contains the subdirectories we want to tidy up.</li><li>storage_folder – the name of the folder where we want the old directories to be placed. Once the operation is complete, this storage folder will be put in the main directory alongside the subdirectories that haven’t been moved.</li></ol>



<p>We now need to assign some objects to variables that will play important roles in the preparation process:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> ...
 def prepare_move(number, path, storage_folder):
     length = SECONDS * number
     now = time.time()
     my_directory = pathlib.Path(path)
     my_subdirectories = (item for item in my_directory.iterdir() if item.is_dir())</pre>



<ol class="wp-block-list"><li>length –&nbsp; is the result of multiplying the SECONDS constant we previously defined by the number of years passed into the function.</li><li>now – is the current time in seconds provided by the time module. This is calculated based on what’s known as the <a href="https://www.programiz.com/python-programming/time">epoch</a>.</li><li>my_directory –&nbsp; stores the main directory path we passed to the function as a pathlib.Path object.</li><li>my_subdirectories – is a <a href="https://blog.finxter.com/python-one-line-generator/">generator</a> containing the paths of our subdirectories produced by iterating through my_directory.</li></ol>



<p>Our next step is to create a for loop to iterate through the subdirectories yielded by our generator and append the details of any that have not been modified during the period we specified to a list of file operations:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> ...
 def prepare_move(number, path, storage_folder):
     length = SECONDS * number
     now = time.time()
     my_directory = pathlib.Path(path)
     my_subdirectories = (item for item in my_directory.iterdir() if item.is_dir())
     file_operations = []
     for subdirectory in my_subdirectories:
         time_stats = _get_stats(subdirectory)
</pre>



<p><br>The first task carried out by the loop is to create a list of all the file modified times in a subdirectory.&nbsp;</p>



<p>This is handled by a separate function which uses the os walk method mention earlier and the last modified value in seconds (st_mtime) available via the Path.stat() utility:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> ...
 def _get_stats(subdirectory):
     time_stats = []
     for folder, _, files in os.walk(subdirectory):
         for file in files:
             file_path = pathlib.Path (folder) / file
             time_stat = file_path.stat().st_mtime
             time_stats.append(time_stat)
     return time_stats
</pre>



<p>The loop then checks these file modified stats to see whether they all precede the specified point in time (with the calculation being done in seconds).</p>



<p>If so, the necessary source and destination paths are constructed and appended to the file_operations list.</p>



<p>Once the loop has iterated through all our subdirectories, the function returns the list of file operations that need to be completed:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> ...
 def prepare_move(number, path, storage_folder):
     length = SECONDS * number
     now = time.time()
     my_directory = pathlib.Path(path)
     my_subdirectories = (item for item in my_directory.iterdir() if item.is_dir())
     file_operations = []
     for subdirectory in my_subdirectories:
         time_stats = _get_stats(subdirectory)
         if all(time_stat &lt; (now - length) for time_stat in time_stats):
             *_, subdirectory_name = subdirectory.parts
             source = subdirectory
             destination = my_directory / storage_folder / subdirectory_name
             file_operations.append((source, destination))
     return file_operations</pre>



<h3 class="wp-block-heading"><em><br></em><em>Moving the Subdirectories</em></h3>



<p>Now we need to define the function that will actually move the file:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> ...
 def move_files(file_operations):
     for operation in file_operations:
         source, destination = operation
         shutil.move(source, destination)</pre>



<p><br>Because all the preparation work has already been done, this function simply accepts the file operations and passes them to shutil.move() &nbsp;via a for loop so each old subdirectory can be placed in the specified storage_folder.</p>



<h3 class="wp-block-heading"><em>Executing the Program</em></h3>



<p>Lastly, we define a <code>main()</code> function to execute the program and call it with our arguments:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> ...
 def main(number, path, storage_folder):
     file_operations = prepare_move(number, path, storage_folder)
     move_files(file_operations)
 
 main(1, r"F:\my_directory", "old_stuff")</pre>



<p><br>Here’s the whole program:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> 
 import os
 import pathlib
 import shutil
 import time
 
 
 SECONDS = 365 * 24 * 60 * 60
 
 
 def prepare_move(number, path, storage_folder):
     length = SECONDS * number
     now = time.time()
     my_directory = pathlib.Path(path)
     my_subdirectories = (item for item in my_directory.iterdir() if item.is_dir())
     file_operations = []
     for subdirectory in my_subdirectories:
         time_stats = _get_stats(subdirectory)
         if all(time_stat &lt; (now - length) for time_stat in time_stats):
             *_, subdirectory_name = subdirectory.parts
             source = subdirectory
             destination = my_directory / storage_folder / subdirectory_name
             file_operations.append((source, destination))
     return file_operations
 
 
 def _get_stats(subdirectory):
     time_stats = []
     for folder, _, files in os.walk(subdirectory):
         for file in files:
             file_path = pathlib.Path (folder) / file
             time_stat = file_path.stat().st_mtime
             time_stats.append(time_stat)
     return time_stats
 
 
 def move_files(file_operations):
     for operation in file_operations:
         source, destination = operation
         shutil.move(source, destination)
 
 
 def main(number, path, storage_folder):
     file_operations = prepare_move(number, path, storage_folder)
     move_files(file_operations)
 
 main(1, r"F:\my_directory", "old_stuff")</pre>



<p>You can see how the directory structure looked before running the program below:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> F:\my_directory>tree /F
 ...
 F:.
 ├───new_files_1
 │   │   new_file.jpg
 │   │
 │   ├───second_level_folder_1
 │   │       really_new_file.txt
 │   │
 │   └───second_level_folder_2
 │           very_new_file.txt
 │
 ├───new_files_2
 │       fairly_new_file.txt
 │
 ├───old_files_1
 │   │   old_file.txt
 │   │
 │   └───second_level_folder_1
 │       │   old_file_as_well.txt
 │       │
 │       └───third_level_folder
 │               really_old_file.jpg
 │
 └───old_files_2
     │   another_old_file.txt
     │
     └───old_second_level_folder
             oldest_file.jpg
             old_file_2.txt
</pre>



<p>And this is what it looks like afterwards:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> 
 F:\my_directory>tree /F
 ...
 F:.
  ├───new_files_1
  │   │   new_file.jpg
  │   │
  │   ├───second_level_folder_1
  │   │       really_new_file.txt
  │   │
  │   └───second_level_folder_2
  │           very_new_file.txt
  │
  ├───new_files_2
  │       fairly_new_file.txt
  │
  └───old_stuff
      ├───old_files_1
      │   │   old_file.txt
      │   │
      │   └───second_level_folder_1
      │       │   old_file_as_well.txt
      │       │
      │       └───third_level_folder
      │               really_old_file.jpg
      │
      └───old_files_2
          │   another_old_file.txt
          │
          └───old_second_level_folder
                  oldest_file.jpg
                  old_file_2.txt 
</pre>



<p><br>Obviously, if you had a directory this small or one where all the subdirectories were labelled as either old or new already, you would be unlikely to need such a program! But hopefully this basic example helps to illustrate how the process would work with a larger, less intuitive directory.</p>



<p>The program shown in this section has been greatly simplified for demonstration purposes. If you would like to see a more complete version, structured as a command line application that summarizes changes before you decide whether to apply them, and enables you to tidy files based on creation and last accessed times as well, you can view it <a href="https://github.com/jonboland/oldfolder/blob/master/oldfolder.py">here</a>.</p>



<h2 class="wp-block-heading">Final Thoughts</h2>



<p>As we’ve seen, the shutil module provides some excellent utilities for working with files and directories, and you can greatly enhance their power and precision by combining them with other tools from the standard library and beyond.</p>



<p>Care should be taken to avoid permanently overwriting or deleting existing files and directories by accident though, so please check out the warnings included in the relevant sections of this article if you haven’t already.</p>



<p>The example program described above is just one of many uses to which shutil’s tools could be put. Here’s hoping you find some ingenious ways to apply them in your own projects soon.</p>
<p>The post <a href="https://blog.finxter.com/python-shutil-high-level-file-operations-demystified/">Python shutil: High-Level File Operations Demystified</a> appeared first on <a href="https://blog.finxter.com">Be on the Right Side of Change</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Python List: Remove Duplicates and Keep the Order</title>
		<link>https://blog.finxter.com/python-list-remove-duplicates-and-keep-the-order/</link>
		
		<dc:creator><![CDATA[Jonathan Boland]]></dc:creator>
		<pubDate>Wed, 09 Sep 2020 18:57:19 +0000</pubDate>
				<category><![CDATA[Python]]></category>
		<guid isPermaLink="false">https://blog.finxter.com/?p=12864</guid>

					<description><![CDATA[<p>Removing duplicates from a list is pretty simple. You can do it with a Python one-liner:  Python set elements have to be unique so converting a list into a set and back again achieves the desired result. What if the original order of the list is important though? That makes things a bit more complicated ... <a title="Python List: Remove Duplicates and Keep the Order" class="read-more" href="https://blog.finxter.com/python-list-remove-duplicates-and-keep-the-order/" aria-label="Read more about Python List: Remove Duplicates and Keep the Order">Read more</a></p>
<p>The post <a href="https://blog.finxter.com/python-list-remove-duplicates-and-keep-the-order/">Python List: Remove Duplicates and Keep the Order</a> appeared first on <a href="https://blog.finxter.com">Be on the Right Side of Change</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>Removing duplicates from a list is pretty simple. You can do it with a <a href="https://blog.finxter.com/python-one-liners-the-ultimate-collection/" target="_blank" rel="noreferrer noopener">Python one-liner</a>: </p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> initial = [1, 1, 9, 1, 9, 6, 9, 7]
>>> result = list(set(initial))
>>> result
[1, 7, 9, 6]
</pre>



<p><a href="https://blog.finxter.com/sets-in-python/">Python set</a> elements have to be unique so converting a list into a set and back again achieves the desired result.</p>



<p>What if the original order of the list is important though? That makes things a bit more complicated because sets are unordered, so once you’ve finished the conversion the order of the list will be lost.</p>



<p>Fortunately, there are several ways to overcome this issue. In this article we’ll look at a range of different solutions to the problem and consider their relative merits.&nbsp;</p>



<h2 class="wp-block-heading">Method 1 – For Loop</h2>



<p>A basic way to achieve the required result is with a <a href="https://blog.finxter.com/python-loops/" target="_blank" rel="noreferrer noopener">for loop</a>:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> >>> initial = [1, 1, 9, 1, 9, 6, 9, 7]
 >>> result = []
 >>> for item in initial:
         if item not in result:
             result.append(item)
 >>> result
 
 [1, 9, 6, 7]</pre>



<p><br>This approach does at least have the advantage of being easy to read and understand. It’s quite inefficient though as the <code>not i</code>n check is being completed for every element of the <code>initial</code> <a href="https://blog.finxter.com/python-lists/" target="_blank" rel="noreferrer noopener" title="The Ultimate Guide to Python Lists">list</a>. </p>



<p>That might not be a problem with this simple example, but the <a href="https://blog.finxter.com/runtime-complexity-of-python-list-methods-easy-table-lookup/" target="_blank" rel="noreferrer noopener" title="Runtime Complexity of Python List Methods [Easy Table Lookup]">time overhead</a> will become increasingly evident if the list gets very large.</p>



<h2 class="wp-block-heading">Method 2 – List Comprehension</h2>



<figure class="wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe title="How to Remove Duplicates From a Python List?" width="937" height="527" src="https://www.youtube.com/embed/GXL23jfNk1Y?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
</div></figure>



<p>One alternative is to use a <a href="https://blog.finxter.com/list-comprehension/">list comprehension</a>:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> >>> initial = [1, 1, 9, 1, 9, 6, 9, 7]
 >>> result = []
 >>> [result.append(item) for item in initial if item not in result]
 [None, None, None, None]
 >>> result
 
 [1, 9, 6, 7]
</pre>



<p>List comprehensions are handy and very powerful Python tools that enable you to combine variables, <a href="https://blog.finxter.com/daily-python-puzzle-control-flow-statements-for-loop/" target="_blank" rel="noreferrer noopener" title="Python Control Flow Statements">for loops and if statements</a>. They make it possible to create a list with a single line of code (but you can split them into multiple lines to improve readability too!).</p>



<p>Although shorter and still fairly clear, using a list comprehension in this instance is not a very good idea.</p>



<p>That’s because it takes the same inefficient approach to membership testing that we saw in <strong>Method 1</strong>. It also relies on the side effects of the comprehension to build the result list, which many consider to be bad practice.</p>



<p>To explain further, even if it’s not assigned to a variable for later use, a list comprehension still creates a list object. So, in the process of <a href="https://blog.finxter.com/python-list-append/" target="_blank" rel="noreferrer noopener" title="Python List append() Method">appending </a>items from the initial list to the <code>result</code> list, our code is also creating a third list containing the return value of each <code>result.append(item)</code> call.</p>



<p>Python functions return the value <code>None</code> if no other return value is specified, meaning that (as you can see above) the output from the third list is: </p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">[None, None, None, None]</pre>



<p>A for loop is clearer and does not rely on side effects so is the better method of the two on this occasion.</p>



<h2 class="wp-block-heading">Method 3 – Sorted Set</h2>



<p>We can’t simply convert our list to a <a href="https://blog.finxter.com/sets-in-python/" title="The Ultimate Guide to Python Sets – with Harry Potter Examples" target="_blank" rel="noreferrer noopener">set </a>to <a href="https://blog.finxter.com/how-to-remove-duplicates-from-a-python-list-of-lists/" title="How to Remove Duplicates From a Python List of Lists?" target="_blank" rel="noreferrer noopener">remove duplicates</a> if we want to preserve order. However, using this approach in conjunction with the <a href="https://blog.finxter.com/python-list-sort-key/">sorted function</a> is another potential way forward:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> >>> initial = [1, 1, 9, 1, 9, 6, 9, 7]
 >>> result = sorted(set(initial), key=initial.index)
 >>> result
 
 [1, 9, 6, 7]
</pre>



<p><br>As you can see, this method uses the <a href="https://blog.finxter.com/daily-python-puzzle-list-indexing/" target="_blank" rel="noreferrer noopener" title="List Indexing">index </a>of the initial list to <a href="https://blog.finxter.com/python-list-sort/" target="_blank" rel="noreferrer noopener" title="Python List sort() – The Ultimate Guide">sort</a> the set of unique values in the correct order.</p>



<p>The problem is that although it’s pretty easy to understand it’s not much faster than the basic for loop shown in <strong>Method 1</strong>.</p>



<h2 class="wp-block-heading">Method 4 – Dictionary fromkeys()</h2>



<figure class="wp-block-image"><img decoding="async" src="https://blog.finxter.com/wp-content/uploads/2020/04/removeDupsPython-1024x576.jpg" alt=""/></figure>



<p>A seriously quick approach is to use a <a href="https://blog.finxter.com/python-dictionary/">dictionary</a>:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> >>> initial = [1, 1, 9, 1, 9, 6, 9, 7]
 >>> result = list(dict.fromkeys(initial))
 >>> result
 
 [1, 9, 6, 7]</pre>



<p><br>Like sets, dictionaries use <a href="https://blog.finxter.com/python-dictionary/#What_is_Hashing_in_Python" target="_blank" rel="noreferrer noopener">hash tables</a>, which means they are extremely fast.</p>



<p>Python dictionary keys are unique by default so converting our list into a dictionary will remove duplicates automatically.</p>



<p>The <code>dict.fromkeys()</code> method creates a new dictionary using the elements from an iterable as the keys. </p>



<p>Once this has been done with our initial list, converting the dictionary back to a list gives the result we’re looking for.</p>



<p>Dictionaries only became ordered in all python implementations when <a href="https://blog.finxter.com/how-to-check-your-python-version/" target="_blank" rel="noreferrer noopener" title="How to Check Your Python Version? A Helpful Guide">Python 3.7</a> was released (this was also an implementation detail of CPython 3.6). </p>



<p>So, if you’re using an older version of Python, you will need to import the <code>OrderedDict</code> class from the collections package in the standard library instead:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> >>> from collections import OrderedDict
 >>> initial = [1, 1, 9, 1, 9, 6, 9, 7]
 >>> result = list(OrderedDict.fromkeys(initial))
 >>> result
 
 [1, 9, 6, 7]</pre>



<p>This approach might not be as fast as using a standard dictionary, but it’s still very speedy!</p>



<figure><iframe src="https://repl.it/repls/GrandioseRemarkableBackups?lite=true" allowfullscreen="true" width="100%" height="400px"></iframe></figure>



<p><strong><em>Exercise: </em></strong><em>Run the code. Does it work?</em></p>



<h2 class="wp-block-heading">Method 5 – more-itertools</h2>



<p>Up to this point, we’ve only looked at lists containing <a href="https://medium.com/@meghamohan/mutable-and-immutable-side-of-python-c2145cf72747#:~:text=Simple%20put%2C%20a%20mutable%20object,Custom%20classes%20are%20generally%20mutable.">immutable items</a>. But what if your list contains mutable data types such as lists, sets or dictionaries?</p>



<p>It’s still possible to use the basic for loop shown in <strong>Method 1</strong>, but that won’t cut the mustard if speed is of the essence.</p>



<p>Also, if we try to use <code>dict.fromkeys()</code> we’ll receive a <code>TypeError</code> because dictionary keys must be hashable.</p>



<p>A great answer to this conundrum comes in the form of a library called <a href="https://more-itertools.readthedocs.io/en/stable/index.html">more-itertools</a>. It’s not part of the Python standard library so you’ll need to <a href="https://blog.finxter.com/the-complete-python-library-guide/">pip install it</a>.</p>



<p>With that done, you can import and use its <code>unique_everseen()</code> function like so:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> >>> from more_itertools import unique_everseen
 >>> mutables = [[1, 2, 3], [2, 3, 4], [1, 2, 3]]
 >>> result = list(unique_everseen(mutables))
 >>> result
 
 [[1, 2, 3], [2, 3, 4]]</pre>



<p>The library <code>more-itertools</code> is designed specifically for working with Python’s iterable data types in efficient ways (it complements itertools which IS part of the standard library).</p>



<p>The function <code>unique_everseen()</code> yields unique elements while preserving order and crucially it can handle mutable data types, so it’s exactly what we’re looking for.</p>



<p>The function also provides a way to remove duplicates even more quickly from a <a href="https://blog.finxter.com/python-list-of-lists/" title="Python List of Lists – A Helpful Illustrated Guide to Nested Lists in Python" target="_blank" rel="noreferrer noopener">list of lists</a>:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> ...
 >>> result = list(unique_everseen(mutables, key=tuple))
 >>> result
 
 [[1, 2, 3], [2, 3, 4]]
</pre>



<p>This works well because it converts the unhashable lists into hashable tuples to speed things up further.</p>



<p>If you want to apply this trick to a list of sets, you can use <a href="https://www.programiz.com/python-programming/methods/built-in/frozenset">frozenset</a> as the key:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> ...
 >>> mutables = [{1, 2, 3}, {2, 3, 4}, {1, 2, 3}]
 >>> result = list(unique_everseen(mutables, key=frozenset))
 >>> result
 
 [{1, 2, 3}, {2, 3, 4}]
</pre>



<p>Specifying a key with a <a href="https://blog.finxter.com/how-to-create-a-list-of-dictionaries-in-python/" title="How to Create a List of Dictionaries in Python?" target="_blank" rel="noreferrer noopener">list of dictionaries</a> is a little more complicated, but can still be achieved with the help of a <a href="https://blog.finxter.com/daily-python-puzzle-lambda-functions/">lambda function</a>:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> ...
 >>> mutables = [{'one': 1}, {'two': 2}, {'one': 1}]
 >>> result = list(
     unique_everseen(mutables, key=lambda x: frozenset(x.items()))
     )
 >>> result
 
 [{'one': 1}, {'two': 2}]
</pre>



<p>The function <code>unique_everseen()</code> can also be used with lists containing a mix of iterable and non-iterable items (think integers and floats), which is a real bonus. Attempting to provide a key in this instance will result in a <code>TypeError</code> though.</p>



<h2 class="wp-block-heading">Method 6 – NumPy unique()</h2>



<p>If you’re working with numerical data, the third-party library <a href="https://blog.finxter.com/numpy-tutorial/">numpy</a> is an option too:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> >>> import numpy as np
 >>> initial = np.array([1, 1, 9, 1, 9, 6, 9, 7])
 >>> _, idx = np.unique(initial, return_index=True)
 >>> result = initial[np.sort(idx)]
 >>> result
 
 [1 9 6 7]
</pre>



<p>The index values of the unique items can be stored by using the <code>np.unique()</code> function with the <code>return_index</code> parameter set to <code>True</code>.</p>



<p>These can then be passed to <code><a href="https://blog.finxter.com/how-to-sort-in-one-line/" title="The Ultimate Introduction to Sorting in NumPy">np.sort()</a></code> to produce a correctly ordered slice with duplicates removed.</p>



<p>Technically this method could be applied to a standard list by first converting it into a<a href="https://blog.finxter.com/numpy-tutorial/" target="_blank" rel="noreferrer noopener" title="NumPy Tutorial – Everything You Need to Know to Get Started"> numpy array</a> and then converting it back to list format at the end. However, this would be an overcomplicated and inefficient way of achieving the result.</p>



<p>Using these kinds of techniques only really makes sense if you are also utilizing some of numpy’s powerful features for other reasons.</p>



<h2 class="wp-block-heading">Method 7 – pandas unique()</h2>



<p>Another third-party library we could use is <a href="https://blog.finxter.com/pandas-cheat-sheets/">pandas</a>:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> >>> import pandas as pd
 >>> initial = pd.Series([1, 1, 9, 1, 9, 6, 9, 7])
 >>> result = pd.unique(initial)
 >>> result
 
 [1 9 6 7]
</pre>



<p><code>pandas</code> is better suited to the task because it preserves order by default and <code>pd.unique()</code> is significantly faster than <code>np.unique()</code>.</p>



<p>As with the numpy method, it would be perfectly possible to convert the result to a standard list at the end.</p>



<p>Again though, unless you’re employing the amazing data analysis tools provided by pandas for another purpose, there is no obvious reason to choose this approach over the even faster option utilizing Python’s built-in dictionary data type (<strong>Method 4</strong>).</p>



<h2 class="wp-block-heading">Summary</h2>



<p>As we’ve seen, there are a wide range of ways to solve this problem and the decision about which one to select should be driven by your particular circumstances.&nbsp;</p>



<p>If you’re writing a quick script and your list isn’t huge, you may opt to use a simple for loop for the sake of clarity.</p>



<p>However, if efficiency is a factor and your lists don’t contain mutable items then going with <code>dict.fromkeys()</code> is an excellent option. It’s great that this method uses one of Python’s built-in data types and retains a good level of readability while massively improving on the for loop’s speed.</p>



<p>Alternatively, if you’re using an older version of Python, <code>OrderedDict.fromkeys()</code> is a really good choice as it’s still very fast.</p>



<p>If you need to work with lists that contain mutable items, importing more-itertools so you can take advantage of the brilliant <code>unique_everseen()</code> function makes a lot of sense.</p>



<p>Lastly, if you’re doing some serious number crunching with numpy or manipulating data with pandas, it would probably be wise to go with the methods built into those tools for this purpose.&nbsp;</p>



<p>The choice is of course yours, and I hope this article has provided some useful insights that will help you pick the right approach for the job at hand.</p>
<p>The post <a href="https://blog.finxter.com/python-list-remove-duplicates-and-keep-the-order/">Python List: Remove Duplicates and Keep the Order</a> appeared first on <a href="https://blog.finxter.com">Be on the Right Side of Change</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Python String Formatting: How to Become a String Wizard with the Format Specification Mini-Language</title>
		<link>https://blog.finxter.com/python-strings-format-specification-mini-language/</link>
		
		<dc:creator><![CDATA[Jonathan Boland]]></dc:creator>
		<pubDate>Mon, 31 Aug 2020 10:41:04 +0000</pubDate>
				<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Data Science]]></category>
		<category><![CDATA[Data Structures]]></category>
		<category><![CDATA[Pandas Library]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Python String]]></category>
		<guid isPermaLink="false">https://blog.finxter.com/?p=12462</guid>

					<description><![CDATA[<p>Python provides fantastic string formatting options, but what if you need greater control over how values are presented? That’s where format specifiers come in.&#160; This article starts with a brief overview of the different string formatting approaches. We’ll then dive straight into some examples to whet your appetite for using Python’s Format Specification Mini-Language in ... <a title="Python String Formatting: How to Become a String Wizard with the Format Specification Mini-Language" class="read-more" href="https://blog.finxter.com/python-strings-format-specification-mini-language/" aria-label="Read more about Python String Formatting: How to Become a String Wizard with the Format Specification Mini-Language">Read more</a></p>
<p>The post <a href="https://blog.finxter.com/python-strings-format-specification-mini-language/">Python String Formatting: How to Become a String Wizard with the Format Specification Mini-Language</a> appeared first on <a href="https://blog.finxter.com">Be on the Right Side of Change</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>Python provides fantastic string formatting options, but what if you need greater control over how values are presented? That’s where format specifiers come in.&nbsp;</p>



<p>This article starts with a brief <strong><em>overview of the different string formatting approaches</em></strong>. We’ll then dive straight into some examples to whet your appetite for using <strong><em>Python’s Format Specification Mini-Language </em></strong>in your <a href="https://blog.finxter.com/how-real-freelancers-earn-money-in-2019-10-practical-python-projects/" target="_blank" rel="noreferrer noopener" title="How Real Freelancers Earn Money in 2020: 10 Practical Python Projects">own projects</a>.</p>



<p>But before all that&#8212;let&#8217;s play with string formatting yourself in the<strong> interactive Python shell</strong>:</p>



<iframe loading="lazy" height="400px" width="100%" src="https://repl.it/@finxter/HumongousLargeServer?lite=true" scrolling="no" frameborder="no" allowtransparency="true" allowfullscreen="true" sandbox="allow-forms allow-pointer-lock allow-popups allow-same-origin allow-scripts allow-modals"></iframe>



<p><em><strong>Exercise</strong>: Create another variable <code>tax</code> and calculate the tax amount to be paid on your income (30%). Now, add both values <code>income</code> and <code>tax</code> in the string&#8212;by using the format specifier <code>%s</code>!</em></p>



<p>Don&#8217;t worry if you struggle with this exercise. After reading this tutorial, you won&#8217;t! Let&#8217;s learn everything you need to know to get started with string formatting in Python.</p>



<h2 class="wp-block-heading">String Formatting Options</h2>



<p><a href="https://blog.finxter.com/python-crash-course/" target="_blank" rel="noreferrer noopener" title="Python Programming Tutorial [+Cheat Sheets]">Python</a>’s string formatting tools have evolved considerably over the years. </p>



<p>The oldest approach is to use the <code>%</code> operator:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> number = 1 + 2
>>> 'The magic number is %s' % number
'The magic number is 3'</pre>



<p><br><em>(The above code snippet already includes a kind of format specifier. More on that later…)</em></p>



<p>The <code>str.format()</code> method was then added:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> 'The magic number is {}'.format(number)
'The magic number is 3'</pre>



<p>Most recently, formatted string literals (otherwise known as<strong> f-strings</strong>) were introduced. F-strings are easier to use and lead to cleaner code, because their syntax enables the value of an expression to be placed directly inside a string:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> f'The magic number is {number}'
'The magic number is 3'</pre>



<p><br>Other options include creating template strings by importing the Template class from Python’s string module, or manually formatting strings (which we’ll touch on in the next section).</p>



<p>If this is all fairly new to you and some more detail would be helpful before moving on, an in-depth explanation of the main string formatting approaches can be found <a href="https://realpython.com/python-string-formatting/" target="_blank" rel="noreferrer noopener">here</a>.<br></p>



<h2 class="wp-block-heading">Format Specifiers</h2>



<p>With that quick summary out of the way, let’s move on to the real focus of this post &#8211; explaining how format specifiers can help you control the presentation of values in strings.</p>



<p><strong><em>F-strings are the clearest and fastest approach to string formatting</em></strong>, so I will be using them to illustrate the use of format specifiers throughout the rest of this article. Please bear in mind though, that specifiers can also be used with the <code>str.format()</code> method. Also, strings using the old <code>%</code> operator actually require a kind of format specification – for example, in the <code>%s</code> example shown in the previous section the letter <code>s</code> is known as a conversion type and it indicates that the standard string representation of the object should be used. </p>



<p><strong><em>So, what exactly are format specifiers and what options do they provide?</em></strong></p>



<p>Simply put, format specifiers allow you to tell Python how you would like expressions embedded in strings to be displayed.</p>



<h3 class="wp-block-heading"><em>Percentage Format and Other Types</em></h3>



<p>For example, if you want a value to be displayed as a percentage you can specify that in the following way:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> asia_population = 4_647_000_000
>>> world_population = 7_807_000_000
>>> percent = asia_population / world_population
>>> f'Proportion of global population living in Asia: {percent:.0%}'
'Proportion of global population living in Asia: 60%'</pre>



<p><br><em>What’s going on here? How has this formatting been achieved?</em></p>



<p>Well the first thing to note is the colon <code>:</code> directly after the variable percent embedded in the f-string. This colon tells Python that what follows is a format specifier which should be applied to that expression’s value.</p>



<p>The <code>%</code> symbol defines that the value should be treated as a percentage, and the .0 indicates the level of precision which should be used to display it. In this case the percentage has been rounded up to a whole number, but if .1 had been specified instead the value would have been rounded to one decimal place and displayed as 59.5%; using .2 would have resulted in 59.52% and so on.</p>



<p>If no format specifier had been included with the expression at all the value would have been displayed as 0.5952350454720123, which is far too precise!</p>



<p>(The % symbol applied in this context should not be confused with the % operator used in old-style string formatting syntax.)</p>



<p>Percentage is just the tip of the iceberg as far as type values are concerned, there are a range of other types that can be applied to integer and float values.</p>



<p>For example, you can display integers in binary, octal or hex formats using the b, o and x type values respectively:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> binary, octal, hexadecimal = [90, 90, 90]
>>> f'{binary:b} - {octal:o} - {hexadecimal:x}'
'1011010 - 132 - 5a'</pre>



<p><strong><br></strong>For a full list of options see the link to the relevant area of the official Python documentation in the <strong><em>Further Reading</em></strong> section at the end of the article.<br></p>



<figure class="wp-block-image"><img decoding="async" src="https://lh5.googleusercontent.com/fRRg7P7dvxF1BA38ATba9RQ9sNyb_yeY4ePRCmLLlFJ2O2p0EJ8X0jEppZm2ySyhXzD9lXNqA_seaApYT_C76cqXWlEi90v-oG5gY8nKlUykEFxbXjNOKl4oNPgeYabdPdhImzIe" alt="A close up of a reptile

Description automatically generated"/></figure>



<h3 class="wp-block-heading"><em>Width Format, Alignment and Fill</em></h3>



<p>Another handy format specification feature is the ability to define the minimum width that values should take up when they’re displayed in strings.</p>



<p>To illustrate how this works, if you were to print the elements of the list shown below in columns without format specification, you would get the following result:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> python, java, p_num, j_num = ["Python Users", "Java Users", 8.2, 7.5]
>>> print(f"|{python}|{java}|\n|{p_num}|{j_num}|")
|Python Users|Java Users|
|8.2|7.5|</pre>



<p><br>Not great, but with the inclusion of some width values matters start to improve:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> print(f"|{python:16}|{java:16}|\n|{p_num:16}|{j_num:16}|")
|Python Users    |Java Users      |
|             8.2|             7.5|
</pre>



<p><br>As you can see, width is specified by adding a number after the colon.</p>



<p>The new output is better, but it seems a bit strange that the titles are aligned to the left while the numbers are aligned to the right. What could be causing this?</p>



<p>Well, it’s actually to do with Python’s default approach for different <a href="https://blog.finxter.com/python-cheat-sheets/" target="_blank" rel="noreferrer noopener" title="Python Cheat Sheets">data types</a>. String values are aligned to the left as standard, while numeric values are aligned to the right. (This might seem slightly odd, but it’s consistent with the approach taken by Microsoft Excel and other spreadsheet packages.)</p>



<p>Fortunately, you don’t have to settle for the default settings. If you want to change this behavior you can use one of the alignment options. For example, focusing on the first column only now for the sake of simplicity, if we want to align the number to the left this can be done by adding the <code>&lt;</code> symbol before the <code>p_num</code> variable’s width value:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> print(f"|{python:16}|\n|{p_num:&lt;16}|")
|Python Users    |
|8.2             |</pre>



<p><br>And the reverse can just as easily be achieved by adding a <code>></code> symbol in front of the width specifier associated with the title value:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> print(f"|{python:>16}|\n|{p_num:16}|")
|    Python Users|
|             8.2|
</pre>



<p><br>But what if you want the rows to be centered? Luckily, Python’s got you covered on that front too. All you need to do is use the <code>^</code> symbol instead:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> print(f"|{python:^16}|\n|{p_num:^16}|")
|  Python Users  |
|      8.2       |</pre>



<p><br>Python’s default fill character is a space, and that’s what has so far been used when expanding the width of our values. We can use almost any character we like though. It just needs to be placed in front of the alignment option. For example, this is what the output looks like when an <a href="https://blog.finxter.com/underscore-in-python/" target="_blank" rel="noreferrer noopener" title="The Single and Double Underscore in Python [“_” vs “__”]">underscore </a>is used to fill the additional space in the title row of our column:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> print(f"|{python:_^16}|\n|{p_num:^16}|")
|__Python Users__|
|      8.2       |</pre>



<p><br>It’s worth noting that the same output can be achieved manually by using the <code><a href="https://blog.finxter.com/python-list-to-string/" title="Python List to String: A Helpful Guide with Interactive Shell">str()</a></code> function along with the appropriate string method (in this case <code>str.center()</code>):</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> print("|", python.center(16, "_"), "|\n|", str(p_num).center(16), "|", sep="")
|__Python Users__|
|      8.2       |</pre>



<p><br>But the f-string approach is much more succinct and considerably <a href="https://blog.finxter.com/python-profilers-how-to-speed-up-your-python-app/" target="_blank" rel="noreferrer noopener" title="Python cProfile – 7 Strategies to Speed Up Your App">faster </a>to evaluate at run time.</p>



<p>Of course, outputting data formatted into rows and columns is just one example of how specifying width, alignment and fill characters can be used.</p>



<p>Also, in reality if you are looking to output a table of information you aren’t likely to be using a single <code><a href="https://blog.finxter.com/the-separator-and-end-arguments-of-the-python-print-function/" title="Python Print Function [And Its SECRET Separator &amp; End Arguments]">print()</a></code> statement. You will probably have several rows and columns to display, which may be constructed with a loop or comprehension, perhaps using <code><a href="https://blog.finxter.com/python-join-list/" title="Python Join List [Ultimate Guide]">str.join()</a></code> to insert separators etc.</p>



<p>However, regardless of the application, in most instances using f-strings with format specifiers instead of taking a manual approach will result in more readable and efficient code.<br></p>



<figure class="wp-block-image"><img decoding="async" src="https://lh5.googleusercontent.com/Y2uoC1houVz0QRgHWMOl2iWn3yq9JS4tIM8JCYRcijM3RBVz0NOymmnxkFsbPd_cVz8Uy8z2k6LA4jGrbuUgd0HLYcipt2bA-Bg7sGIRmSBYmC3lgSkSg9H1cV8chygdqkz2TKjI" alt="A picture containing camera

Description automatically generated"/></figure>



<h3 class="wp-block-heading"><em>24-Hour Clock Display</em></h3>



<p>As another example, let’s say we want to calculate what the time of day will be after a given number of hours and minutes has elapsed (starting at midnight):</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> hours = 54
>>> minutes = 128
>>> quotient, minute = divmod(minutes, 60)
>>> hour = (hours + quotient) % 24
>>> f'{hour}:{minute}'
'8:8'</pre>



<p><br>So far so good. Our program is correctly telling us that after 54 hours and 128 minute the time of day will be 8 minutes past 8 in the morning, but the problem is that it’s not very easy to read. Confusion could arise about whether it’s actually 8 o’clock in the morning or evening and having a single digit to represent the number of minutes just looks odd.</p>



<p>To fix this we need to insert <a href="https://blog.finxter.com/python-pad-zeros-to-a-string/" target="_blank" rel="noreferrer noopener" title="Python How to Pad Zeros to a String?">leading zeros</a> when the hour or minute value is a single digit, which can be achieved using something called sign-aware zero padding. This sounds pretty complicated, but in essence we just need to use a 0 instead of one of the alignment values we saw earlier when defining the f-string, along with a width value of 2:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> f'{hour:02}:{minute:02}'
'08:08'</pre>



<p><br>Hey presto! The time is now in a clear 24-hour clock format. This approach will work perfectly for times with double-digit hours and minutes as well, because the width value is a maximum and the zero padding will not be used if the value of either expression occupies the entire space:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> hours = 47
>>> minutes = 59
...
>>> f'{hour:02}:{minute:02}'
'23:59'</pre>



<figure class="wp-block-image"><img decoding="async" src="https://lh5.googleusercontent.com/YeBIxkRrF7QlqChrIjtCMREEUzQoqkApGtFTHVmyTbpJ2mD9eBiXd4Vd4mNNqrzP-0sXBr4m5_9OdLpvK4avcLWC67Nbq0swYb0Ky4T7BkstRR6BcKaRrt389LFA0zy3w_i2XtYG" alt="A picture of stars in the sky

Description automatically generated"/></figure>



<h3 class="wp-block-heading"><em>Grouping Options</em></h3>



<p>The longer numbers get the harder they can be to read without thousand separators, and if you need to insert them this can be done using a grouping option:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> proxima_centauri = 40208000000000
>>> f'The closest star to our own is {proxima_centauri:,} km away.'
'The closest star to our own is 40,208,000,000,000 km away.'</pre>



<p><br>You can also use an underscore as the separator if you prefer:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> f'The closest star to our own is {proxima_centauri:_} km away.'
'The closest star to our own is 40_208_000_000_000 km away.'</pre>



<h3 class="wp-block-heading"><em><br></em><em>Putting It All Together</em></h3>



<p>You probably won’t need to use a wide variety of format specification values with a single expression that often, but if you do want to put several together the order is important.</p>



<p>Staying with the astronomical theme, for demonstration purposes we’ll now show the distance between the Sun and Neptune in millions of kilometers:</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> neptune = "Neptune"
>>> n_dist = 4_498_252_900 / 1_000_000
>>> print(f"|{neptune:^15}|\n|{n_dist:~^15,.1f}|")
|    Neptune    |
|~~~~4,498.3~~~~|</pre>



<p><br>As you can see, reading from right to left we need to place the n_dist format specification values in the following order:</p>



<ol class="wp-block-list"><li><strong>Type  </strong>– f defines that the value should be displayed using fixed-point notation</li><li><strong>Precision </strong>– .1 indicates that a single decimal place should be used </li><li><strong>Grouping </strong>– , denotes that a comma should be used as the thousand separator</li><li><strong>Width </strong>– 15<em> </em>is set as the minimum number of characters</li><li><strong>Align </strong>– ^<em> </em>defines that the value should be centered</li><li><strong>Fill </strong>– ~<em> </em>indicates that a tilde should occupy any unused space</li></ol>



<p>In general, format values that are not required can simply be omitted. However, if a fill value is specified without a corresponding alignment option a ValueError will be raised.<br></p>



<h2 class="wp-block-heading">Final Thoughts and Further Reading</h2>



<p>The examples shown in this article have been greatly simplified to demonstrate features in a straightforward way, but I hope they have provided some food for thought, enabling you to envisage ways that the Format Specification Mini-Language could be applied in real world projects.</p>



<p>Basic columns have been used to demonstrate aspects of format specification, and displaying tabular information as part of a Command Line Application is one example of the ways this kind of formatting could be employed.&nbsp;</p>



<p>If you want to work with and display larger volumes of data in table format though, you would do well to check out the excellent tools provided by the pandas library, which you can read about in these <a href="https://blog.finxter.com/category/pandas-library/" target="_blank" rel="noreferrer noopener">Finxter articles</a>.</p>



<p>Also, if you would like to see the full list of available format specification values they can be found in this section of the <a href="https://docs.python.org/3/library/string.html#format-specification-mini-language" target="_blank" rel="noreferrer noopener">official Python documentation</a>.</p>



<p>The best way to really get the hang of how format specifiers work is to do some experimenting with them yourself. Give it a try – I’m sure you’ll have some fun along the way!</p>
<p>The post <a href="https://blog.finxter.com/python-strings-format-specification-mini-language/">Python String Formatting: How to Become a String Wizard with the Format Specification Mini-Language</a> appeared first on <a href="https://blog.finxter.com">Be on the Right Side of Change</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>

<!--
Performance optimized by W3 Total Cache. Learn more: https://www.boldgrid.com/w3-total-cache/?utm_source=w3tc&utm_medium=footer_comment&utm_campaign=free_plugin

Page Caching using Disk: Enhanced 
Minified using Disk

Served from: blog.finxter.com @ 2026-06-29 00:37:15 by W3 Total Cache
-->