5 Best Ways to Validate Data Using Cerberus in Python

💡 Problem Formulation: When working with data in Python, ensuring its validity against a pre-defined schema is crucial. This avoids errors and inconsistencies in processing and storing data. Cerberus is a lightweight and extensible data validation library that can help with this challenge. For instance, given a dictionary with user information, we need to validate it against criteria such as the existence of a name, age being an integer, and email being in the correct format.

Method 1: Basic Schema Validation

Basic Schema Validation is the straightforward approach to validate data with Cerberus. You define a schema where the keys correspond to the fields that need validation, and the values describe the validation rules.

Here’s an example:

from cerberus import Validator

schema = {'name': {'type': 'string'}, 'age': {'type': 'integer', 'min': 18}}
v = Validator(schema)

document = {'name': 'John Doe', 'age': 30}
is_valid = v.validate(document)

Output: True

This snippet defines a schema with two rules: ‘name’ must be a string, and ‘age’ must be an integer with a minimum value of 18. The Validator object checks if the provided document meets these criteria, returning True for valid data.

Method 2: Handling Nested Structures

Handling Nested Structures with Cerberus allows validation of complex, hierarchical data. This is achieved by defining a schema with rules for nested dictionaries, using the ‘schema’ rule.

Here’s an example:

from cerberus import Validator

schema = {
    'product': {
        'type': 'dict',
        'schema': {
            'id': {'type': 'integer'},
            'name': {'type': 'string'}
        }
    }
}

v = Validator(schema)
document = {'product': {'id': 1, 'name': 'Computer'}}
is_valid = v.validate(document)

Output: True

Here, the provided nested structure with a ‘product’ containing an ‘id’ and ‘name’ is perfectly validated against the nested schema. This method extends Cerberus’s utility to more complex data scenarios.

Method 3: Custom Validators

Custom Validators in Cerberus are user-defined functions that give developers the ability to introduce custom validation rules that are not provided by Cerberus out of the box.

Here’s an example:

from cerberus import Validator

def is_even(field, value, error):
    if value % 2 != 0:
        error(field, "must be an even number")

schema = {'number': {'validator': is_even}}
v = Validator(schema)

document = {'number': 10}
is_valid = v.validate(document)

Output: True

The given example defines a custom validator called is_even, which is referenced in the schema. When the validator runs, it checks if the ‘number’ in the document is even, returning True when the check passes.

Method 4: Error Handling

Error Handling with Cerberus involves capturing and reviewing the errors throughout the validation process. This is important for debugging and providing feedback to the data source/provider.

Here’s an example:

from cerberus import Validator

schema = {'name': {'type': 'string'}, 'age': {'type': 'integer', 'min': 18}}
v = Validator(schema)

document = {'name': 'Jane Doe', 'age': 'young'}
v.validate(document)
errors = v.errors

Output: {'age': ['must be of integer type']}

This code attempts to validate a document against the schema. However, the ‘age’ field is not an integer, triggering validation errors. The errors property on the Validator object contains the error messages generated during validation.

Bonus One-Liner Method 5: Validator Shortcuts

Validator Shortcuts method provides a condensed way of handling simple validation in a single line. This method is great for quick checks and inline validation tasks.

Here’s an example:

from cerberus import Validator

is_valid = Validator({'name': {'type': 'string'}}).validate({'name': 'John'})

Output: True

This one-liner creates a Validator object with a schema and immediately validates the document against it. It’s a quick and easy check for simple validation scenarios where you don’t need to reuse the schema or Validator instance.

Summary/Discussion

Method 1: Basic Schema Validation. Easy to understand and implement. Suitable for simple flat data structures. Not ideal for complex nested data or custom validation rules.
Method 2: Handling Nested Structures. Supports complex data scenarios. Requires more detailed schema definitions, which can get cumbersome with very deeply nested structures.
Method 3: Custom Validators. Provides flexibility to define specific behaviors beyond built-in validation. Requires extra effort to create custom functions and may increase complexity.
Method 4: Error Handling. Essential for understanding validation failure reasons. Adds additional steps to catch and handle errors appropriately after validation attempts.
Method 5: Validator Shortcuts. Offers a concise way to validate data. Great for quick checks but lacks the features of full error reporting and reusability.