Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas overwrite values in multiple columns at once based on condition of values in one column

I have such DataFrame:

df = pd.DataFrame(data={
    'col0': [11, 22,1, 5]
    'col1': ['aa:a:aaa', 'a:a', 'a', 'a:aa:a:aaa'],
    'col2': ["foo", "foo", "foobar", "bar"],
    'col3': [True, False, True, False],
    'col4': ['elo', 'foo', 'bar', 'dupa']})

I want to get length of the list after split on ":" in col1, then I want to overwrite the values if length > 2 OR not overwrite the values if length <= 2.

Ideally, in one line as fast as possible.

Currently, I try but it returns ValueError.

df[['col1', 'col2', 'col3']] = df.loc[df['col1'].str.split(":").apply(len) > 2], ("", "", False), df[['col1', 'col2', 'col3']])

EDIT: condition on col1. EDIT2: thank you for all the great and quickly provided answers. amazing! EDIT3: timing on 10^6 rows:

@ansev 3.2657s

@jezrael 0.8922s

@anky_91 1.9511s

like image 695
Dariusz Krynicki Avatar asked Jan 20 '20 14:01

Dariusz Krynicki


People also ask

How replace values in column based on multiple conditions in pandas?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.

Can I use ILOC and LOC together?

loc and iloc are interchangeable when labels are 0-based integers.


3 Answers

You need series.str.len() after splitting to determining the length of the list , then you can compare and using .loc[] , assign the the list wherever condition matches:

df.loc[df['col1'].str.split(":").str.len()>2,['col1','col2','col3']]=["", "", False]
print(df)

   col0 col1    col2   col3  col4
0    11               False   elo
1    22  a:a     foo  False   foo
2     1    a  foobar   True   bar
3     5               False  dupa
like image 178
anky Avatar answered Oct 17 '22 00:10

anky


Use Series.str.count, add 1, compare by Series.gt and assign list to filtered columns in list:

df.loc[df['col1'].str.count(":").add(1).gt(2), ['col1','col2','col3']] = ["", "", False]
print (df)
   col0 col1    col2   col3  col4
0    11               False   elo
1    22  a:a     foo  False   foo
2     1    a  foobar   True   bar
3     5               False  dupa
like image 8
jezrael Avatar answered Oct 17 '22 02:10

jezrael


Another approach is Series.str.split with expand = True and DataFrame.count with axis=1.

df.loc[df['col1'].str.split(":",expand = True).count(axis=1).gt(2),['col1','col2','col3']]=["", "", False]
print(df)
   col0 col1    col2   col3  col4
0    11               False   elo
1    22  a:a     foo  False   foo
2     1    a  foobar   True   bar
3     5               False  dupa
like image 5
ansev Avatar answered Oct 17 '22 02:10

ansev