How to randomly delete 10% attributes values from df in pandas

Question

I have a example dataset. It has 2000 rows and 15 columns. Last columns will be need as decision class in classification.

I need to delete randomly 10% of attributes values. So 10% values from columns 0-13 should be NA.

I wrote a for loop. It randomizes a colNumber (0-13) and rowNumber (0-2000) and it replaces a value to NA. But I think (and I see this) it's not a faster solution. I tried to find something else in pandas, not core python, but couldn't find anything.

Maybe someone have better idea? More pandas solution? Or maybe something completely different?

Chris · Accepted Answer

You can make use of pandas' sample method.

Imports and set up data

import numpy as np
import pandas as pd

n = 100
data = {
    'a': np.random.random(size=n),
    'b': np.random.choice(list(string.ascii_lowercase), size=n),
    'c': np.random.random(size=n),
}

df = pd.DataFrame(data)

Solution

for col in df.columns:
    df.loc[df.sample(frac=0.1).index, col] = np.nan

Solution without for loop:

def delete_10(col):
    col.loc[col.sample(frac=0.1).index] = np.nan
    return col

df.apply(delete_10, axis=0)

Check

Check to see proportion of NaN values:

df.isnull().sum() / len(df)

Output:

a    0.1
b    0.1
c    0.1
dtype: float64

How to randomly delete 10% attributes values from df in pandas

Tags:

python

pandas

martin

1 Answers

Imports and set up data

Solution

Solution without for loop:

Check

Chris

Recent Activity

Donate For Us

How to randomly delete 10% attributes values from df in pandas

Tags:

python

pandas

martin

1 Answers

Imports and set up data

Solution

Solution without for loop:

Check

Chris

Related questions

Recent Activity

Donate For Us