I have a simple dataframe as such:
df = [ {'col1' : 'A', 'col2': 'B', 'col3': 'C', 'col4':'0'},
{'col1' : 'M', 'col2': '0', 'col3': 'M', 'col4':'0'},
{'col1' : 'B', 'col2': 'B', 'col3': '0', 'col4':'B'},
{'col1' : 'X', 'col2': '0', 'col3': 'Y', 'col4':'0'}
]
df = pd.DataFrame(df)
df = df[['col1', 'col2', 'col3', 'col4']]
df
Which looks like this:
| col1 | col2 | col3 | col4 |
|------|------|------|------|
| A | B | C | 0 |
| M | 0 | M | 0 |
| B | B | 0 | B |
| X | 0 | Y | 0 |
I just want to replace repeated characters with the character '0', across the rows. It boils down to keeping the first duplicate value we come across, as like this:
| col1 | col2 | col3 | col4 |
|------|------|------|------|
| A | B | C | 0 |
| M | 0 | 0 | 0 |
| B | 0 | 0 | 0 |
| X | 0 | Y | 0 |
This seems so simple but I'm stuck. Any nudges in the right direction would be really appreciated.
You can use the duplicated
method to return a boolean indexer of whether elements are duplicates or not:
In [214]: pd.Series(['M', '0', 'M', '0']).duplicated()
Out[214]:
0 False
1 False
2 True
3 True
dtype: bool
Then you could create a mask by mapping this across the rows of your dataframe, and using where
to perform your substitution:
is_duplicate = df.apply(pd.Series.duplicated, axis=1)
df.where(~is_duplicate, 0)
col1 col2 col3 col4
0 A B C 0
1 M 0 0 0
2 B 0 0 0
3 X 0 Y 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With