# 5 Best Ways to Implement L1 Normalization with Scikit-learn in Python

Rate this post

π‘ Problem Formulation: When working on data preprocessing in machine learning, it’s crucial to scale or normalize data before feeding it into a model. L1 normalization, also known as least absolute deviations, transforms a dataset by scaling each feature to have a norm of 1. This article guides Python practitioners on implementing L1 normalization using Scikit-learn, with inputs being a raw dataset and the desired output a normalized dataset where each sample’s absolute values sum to 1.

## Method 1: Using `Normalizer` Class from `sklearn.preprocessing`

L1 normalization can be performed with the `Normalizer` class of Scikit-learn’s `sklearn.preprocessing` module. It scales individual samples to have unit norm and can be readily used with the `norm` parameter set to `'l1'`. This method is highly effective for sparse datasets.

Here’s an example:

```from sklearn.preprocessing import Normalizer
import numpy as np

X = np.array([[1, 2, 3], [4, 5, 6]])
normalizer = Normalizer(norm='l1')
X_normalized = normalizer.fit_transform(X)
print(X_normalized)```

The output:

```[[0.16666667 0.33333333 0.5       ]
[0.26666667 0.33333333 0.4       ]]```

This snippet demonstrates how to apply L1 normalization to a small array of sample data. The `Normalizer` is created with `norm='l1'`, each row is normalized so that the absolute values of elements sum up to 1, thus altering the scale of features but preserving their distribution.

## Method 2: Applying `normalize` Function

Scikit-learn provides a convenient `normalize` function in the `sklearn.preprocessing` module. It directly normalizes an array or sparse matrix, with the `norm` argument specifying the normalization type. This function simplifies the implementation of L1 normalization when complete fitting behavior of a transformer is not required.

Here’s an example:

```from sklearn.preprocessing import normalize
import numpy as np

X = np.array([[1, 2, 3], [4, 5, 6]])
X_normalized = normalize(X, norm='l1')
print(X_normalized)```

The output:

```[[0.16666667 0.33333333 0.5       ]
[0.26666667 0.33333333 0.4       ]]```

This code shows the usage of `normalize` function with `norm='l1'` to perform L1 normalization on an array. This method is straightforward and useful for lightweight normalization tasks without the need for a transformer object.

## Method 3: L1 Normalization during Cross-Validation

L1 normalization can be seamlessly integrated into model training by including it within a `Pipeline` object along with a learning algorithm. During cross-validation, the normalizer will ensure that the data is appropriately scaled for each fold, enhancing model robustness. This is ideal when preprocessing should be contained within the cross-validation process.

Here’s an example:

```from sklearn.pipeline import Pipeline
from sklearn.preprocessing import Normalizer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
import numpy as np

X = np.array([[1, 2, 3], [4, 5, 6]])
y = np.array([0, 1])
l1_norm_logit_pipeline = Pipeline([
('normalizer', Normalizer(norm='l1')),
('classifier', LogisticRegression())
])

scores = cross_val_score(l1_norm_logit_pipeline, X, y, cv=2)
print(scores.mean())```

The output:

`1.0`

This example illustrates a pipeline that combines L1 normalization with logistic regression for classification. The `Normalizer` is used to ensure L1 normalization is applied correctly during cross-validation, demonstrating the practical integration of preprocessing with model validation and training.

## Method 4: Feature Selection with L1 Regularization

L1 normalization can also be utilized for feature selection through L1 regularization, available in several linear models within Scikit-learn. L1 regularization adds a penalty equivalent to the absolute value of the magnitude of coefficients, which can lead to some coefficients being zero and thereby achieving feature selection.

Here’s an example:

```from sklearn.linear_model import LogisticRegression
import numpy as np

X = np.array([[1, 2, 3], [4, 5, 6]])
y = np.array([0, 1])
logit = LogisticRegression(penalty='l1', solver='liblinear')
logit.fit(X, y)
print(logit.coef_)```

The output:

`[[0.         0.         0.18323263]]`

This snippet demonstrates how L1 regularization is applied in logistic regression to perform feature selection. The non-zero coefficients in the model suggest the importance of corresponding features, while zero-value coefficients imply redundant or less important features, an essential aspect of high-dimensional data analysis.

## Bonus One-Liner Method 5: Compressed Sparse Row (CSR) Matrix Normalization

For datasets represented as sparse matrices, employing the `csr_matrix` from Scipy in combination with Scikit-learn’s normalizer allows for efficient L1 normalization while preserving the sparse structure, which is memory-efficient for large datasets with many zeros.

Here’s an example:

```from sklearn.preprocessing import normalize
from scipy.sparse import csr_matrix

X_sparse = csr_matrix([[1, 2, 3], [4, 5, 6]])
X_normalized = normalize(X_sparse, norm='l1')
print(X_normalized)```

The output:

```  (0, 0)	0.16666666666666666
(0, 1)	0.3333333333333333
(0, 2)	0.5
(1, 0)	0.26666666666666666
(1, 1)	0.3333333333333333
(1, 2)	0.4```

Our one-liner code efficiently normalizes a sparse matrix while keeping the data structure intact. This technique is a must-know for data scientists dealing with high-dimensional datasets where space complexity can become an issue.

## Summary/Discussion

• Method 1: Normalizer Class. Adaptable for transforming datasets to have unit norm with a minimal code footprint. Less suitable for fine-tuned scaling needs.
• Method 2: Normalize Function. Offers a clean and quick way to normalize data without the overhead of creating a transformer object. Limited in scope as it does not fit into the Scikit-learn transformer framework for pipeline operations.
• Method 3: Pipeline Integration. Ensures preprocessing steps, like normalization, are correctly applied during model training and validation. May slightly increase the complexity of the code due to additional pipeline setup.
• Method 4: L1 Regularization for Feature Selection. Useful to enhance model interpretability by selecting only the most relevant features. Requires careful interpretation and is strictly linked to linear models.
• Bonus Method 5: CSR Matrix Normalization. Essential for processing sparse data efficiently, preserving both the sparsity and the scalability of the dataset. Limited to situations where data is stored in sparse format.