**π‘ Problem Formulation:** In data analysis, you may encounter a situation where you need to clean up your data by replacing negative values with the last observed positive value in a sequence. For instance, in a pandas DataFrame, you might have the sequence [-1, 2, -3, 4] and you want to transform it into [2, 2, 4, 4]. Effectively, each negative number is replaced by the most recent non-negative (positive or zero) number preceding it in the DataFrame.

## Method 1: Loop with .iloc

Using a for-loop to iterate over each element of the DataFrame and updating negative values with the last positive value seen is a straightforward approach that’s easy to understand. The `.iloc`

indexer is used to access DataFrame elements by integer location, allowing for manual value replacement.

Here’s an example:

import pandas as pd df = pd.DataFrame({'A': [1, -1, -2, 3, -5, 6]}) last_pos_val = None for i in range(len(df)): if df.iloc[i, 0] > 0: last_pos_val = df.iloc[i, 0] elif last_pos_val is not None: df.iloc[i, 0] = last_pos_val print(df)

The output of this code snippet:

A 0 1 1 1 2 1 3 3 4 3 5 6

This code snippet updates each negative value in column ‘A’ of the DataFrame with the last observed positive value tracked by the `last_pos_val`

variable. It essentially performs in-place replacement by iterating over each value and checking its sign.

## Method 2: pandas.DataFrame.apply() with a Custom Function

The `apply()`

function in pandas can be used with a custom function that maintains the state of the last positive number seen and updates negative numbers accordingly. This method is more idiomatic and utilizes pandas’ inherent functionality.

Here’s an example:

import pandas as pd df = pd.DataFrame({'A': [1, -1, -2, 3, -5, 6]}) def replace_neg_with_last_pos(series): last_pos_val = None for i, v in enumerate(series): if v > 0: last_pos_val = v elif last_pos_val is not None: series[i] = last_pos_val return series df['A'] = df['A'].apply(replace_neg_with_last_pos) print(df)

The output of this code snippet:

A 0 1 1 1 2 1 3 3 4 3 5 6

This code snippet creates a function `replace_neg_with_last_pos`

, which traverses the series, updating negative numbers with the last positive value. This function is then applied to the DataFrame using the `apply()`

method.

## Method 3: Using pandas.DataFrame.mask() and .ffill()

The `mask()`

method is a powerful tool that replaces values where a condition is True. After using `mask()`

to replace the negative values with NaN, the `ffill()`

method (forward fill) can be used to fill these NaNs with the last observed non-NaN value.

Here’s an example:

import pandas as pd df = pd.DataFrame({'A': [1, -1, -2, 3, -5, 6]}) df['A'] = df['A'].mask(df['A'] < 0).ffill() print(df)

The output of this code snippet:

A 0 1.0 1 1.0 2 1.0 3 3.0 4 3.0 5 6.0

This method leverages pandas’ `mask()`

function to replace negative values with NaNs, and then applies `ffill()`

to forward fill the NaNs with the last positive value.

## Method 4: Combination of .loc and .shift()

Another pandas-centric way is to use `.loc`

for conditional indexing along with `.shift()`

to bring the previous value forward for comparison. This approach can be effective when dealing with consecutive negative values.

Here’s an example:

import pandas as pd df = pd.DataFrame({'A': [1, -1, -2, 3, -5, 6]}) positive_mask = df['A'] >= 0 df.loc[~positive_mask, 'A'] = None df['A'] = df['A'].fillna(method='ffill') print(df)

The output of this code snippet:

A 0 1.0 1 1.0 2 1.0 3 3.0 4 3.0 5 6.0

This method sets negative values to NaN using conditional indexing and the `.loc`

accessor. It then uses `fillna()`

with the method set to ‘ffill’ to forward-fill the NaN values with the last valid (positive) one.

## Bonus One-Liner Method 5: Using numpy.where()

The `numpy.where()`

function is often used for efficient conditional operations. You can use it in tandem with `ffill()`

to replace negative numbers in a single line of code. Though concise, this method may be less straightforward for pandas beginners.

Here’s an example:

import pandas as pd import numpy as np df = pd.DataFrame({'A': [1, -1, -2, 3, -5, 6]}) df['A'] = np.where(df['A'] < 0, np.nan, df['A']).ffill() print(df)

The output of this code snippet:

A 0 1.0 1 1.0 2 1.0 3 3.0 4 3.0 5 6.0

This one-liner uses `numpy.where()`

to replace negative numbers with NaNs directly within the pandas DataFrame, and then it applies `ffill()`

to propagate the last valid value forward to fill these NaNs.

## Summary/Discussion

**Method 1: Loop with .iloc**. Easy to understand but not the most efficient with large datasets. It can be slow because of explicit Python-level looping over DataFrame rows.**Method 2: pandas.DataFrame.apply() with a Custom Function**. More pandas idiomatic and potentially faster than Method 1, but still includes a manual iteration which can be slow for large DataFrames.**Method 3: Using pandas.DataFrame.mask() and .ffill()**. Both readable and efficient due to vectorized operations. This method strikes a good balance between performance and ease of comprehension.**Method 4: Combination of .loc and .shift()**. Good for handling consecutive negative values and is also a vectorized approach, offering a solid mix of readability and performance.**Bonus Method 5: Using numpy.where()**. Highly efficient and concise, but might be less readable to those not familiar with numpy. It’s best for users who prioritize performance and are comfortable with numpy functions.