I have such DataFrame:
df = pd.DataFrame(data={
    'col0': [11, 22,1, 5]
    'col1': ['aa:a:aaa', 'a:a', 'a', 'a:aa:a:aaa'],
    'col2': ["foo", "foo", "foobar", "bar"],
    'col3': [True, False, True, False],
    'col4': ['elo', 'foo', 'bar', 'dupa']})
I want to get length of the list after split on ":" in col1, then I want to overwrite the values if length > 2 OR not overwrite the values if length <= 2.
Ideally, in one line as fast as possible.
Currently, I try but it returns ValueError.
df[['col1', 'col2', 'col3']] = df.loc[df['col1'].str.split(":").apply(len) > 2], ("", "", False), df[['col1', 'col2', 'col3']])
EDIT: condition on col1. EDIT2: thank you for all the great and quickly provided answers. amazing! EDIT3: timing on 10^6 rows:
@ansev 3.2657s
@jezrael 0.8922s
@anky_91 1.9511s
You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.
loc and iloc are interchangeable when labels are 0-based integers.
You need series.str.len() after splitting to determining the length of the list , then you can compare and using .loc[] , assign the the list wherever condition matches:
df.loc[df['col1'].str.split(":").str.len()>2,['col1','col2','col3']]=["", "", False]
print(df)
   col0 col1    col2   col3  col4
0    11               False   elo
1    22  a:a     foo  False   foo
2     1    a  foobar   True   bar
3     5               False  dupa
                        Use Series.str.count, add 1, compare by Series.gt and assign list to filtered columns in list:
df.loc[df['col1'].str.count(":").add(1).gt(2), ['col1','col2','col3']] = ["", "", False]
print (df)
   col0 col1    col2   col3  col4
0    11               False   elo
1    22  a:a     foo  False   foo
2     1    a  foobar   True   bar
3     5               False  dupa
                        Another approach is Series.str.split with expand = True and DataFrame.count with axis=1.
df.loc[df['col1'].str.split(":",expand = True).count(axis=1).gt(2),['col1','col2','col3']]=["", "", False]
print(df)
   col0 col1    col2   col3  col4
0    11               False   elo
1    22  a:a     foo  False   foo
2     1    a  foobar   True   bar
3     5               False  dupa
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With