When working with textual data files in Python, it is quite common to encounter a file that contains a list of dictionaries, stored in a format which is nearly identical to JSON, but not quite. This could happen due to the way the data was persisted, often using Python’s str()
function, as opposed to a proper serialization method.
In this article, we’ll look at multiple ways you can read such data back into a Python list of dictionaries.
Problem Formulation
π‘ Problem Formulation: Given a text file with content that represents a list of dictionaries – albeit not in strict JSON format – how can we read the content of this file and convert it back to actual Python data structures? The challenge is the safe conversion of this string representation back to usable Python types without executing potentially unsafe code.
Let’s say there’s a text file named data.txt
that contains the following content:
[{'name': 'Alice', 'age': 30, 'city': 'New York'}, {'name': 'Bob', 'age': 25, 'city': 'Los Angeles'}]
This file is meant to hold a list of dictionaries, each representing a person with their name, age, and city. The problem is to read this file back into Python in such a way that you regain a list of dictionaries that you can work with programmatically.
Method 1: Using ast.literal_eval
ast.literal_eval
safely evaluates a string containing Python literal expressions, converting it to corresponding Python data types. This is considered safe because it only considers literal structures like strings, numbers, tuples, lists, dictionaries, and so on, and rejects any complex or potentially dangerous Python code.
import ast with open('file.txt') as f: data = ast.literal_eval(f.read())
In the code snippet above, we open the file 'file.txt'
for reading, use the read()
method to return its content as a string, then pass this string to ast.literal_eval
. The result, data
, is the Python data structure equivalent of the string representation within the file.
Method 2: Using json.loads with Replacing
If your data is mostly JSON, with only a few Python-specific modifications (e.g., single quotes instead of double quotes), you might consider string replacement to fix these issues and then use the json
module.
import json with open('file.txt') as f: content = f.read() corrected_content = content.replace("'", '"') data = json.loads(corrected_content)
The replace()
method is called on the file content to substitute single quotes for double quotes, making the string JSON-compliant. Then, json.loads
is used to load the string into a data structure.
π How to Read a Dictionary from a File
Method 3: Using pickle
If the data was originally serialized using Python’s pickle
module, then you would use pickle
to deserialize it. Pickle can serialize and deserialize complex Python objects, but be warned, it can execute arbitrary code and should not be used on untrusted data.
import pickle with open('file.pkl', 'rb') as f: data = pickle.load(f)
Here we assume the file was named with a .pkl
extension, indicating pickle serialization. The file is opened in binary read mode ('rb'
) and pickle.load()
is used to deserialize the contents directly into a Python object.
π How to Serialize a Python Dict into a String and Back?
Method 4: Using eval (Not Recommended)
While using Python’s built-in eval()
function can convert a string representation of a list of dictionaries back to Python objects, it is generally discouraged due to security risks. eval
will execute any included code, which can be a significant security concern if the data source is not entirely trustworthy.
# WARNING: Only use this method if you completely trust the data source with open('file.txt') as f: data = eval(f.read())
The eval
function takes a string and evaluates it as Python expression. However, this method can be dangerous and should only be used with completely trusted data sources.
π Python eval()
Summary/Discussion
Converting a string-represented list of dictionaries back into actual Python data structures can be a common task when dealing with files written using the str(list)
approach.
The safest and most commonly recommended method is to use ast.literal_eval
, though the json
module might also be helpful if the data is close to valid JSON.
The pickle
module works for data originally serialized in this format, but like eval
, can be unsafe if the data source is not trusted.