I have a pandas DataFrame as follows:
mail = DataFrame({'mail' : ['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]']})
that looks like:
mail
0 [email protected]
1 [email protected]
2 [email protected]
3 [email protected]
4 [email protected]
5 [email protected]
6 [email protected]
What I want to do is to filter out (elimiante) all those rows in which the value in the column mail ends with '@gmail.com'.
You can use str.endswith
and negate the result of the boolean Series with ~
:
mail[~mail['mail'].str.endswith('@gmail.com')]
Which produces:
mail
2 [email protected]
3 [email protected]
4 [email protected]
5 [email protected]
6 [email protected]
Pandas has many other vectorised string operations which are accessible through the .str
accessor. Many of these are instantly familiar from Python's own string methods, but come will built in handling of NaN
values.
A column with type str
has a field .str
on it, using which you can access the standard functions defined for a single str
:
[6]: mail['mail'].str.endswith('gmail.com')
Out[6]:
0 True
1 True
2 False
3 False
4 False
5 False
6 False
Name: mail, dtype: bool
Then you can filter using this Series:
[7]: mail[~mail['mail'].str.endswith('gmail.com')]
Out[7]:
mail
2 [email protected]
3 [email protected]
4 [email protected]
5 [email protected]
6 [email protected]
A similar property .dt
exists for accessing date/time related properties of a column if it contains date-data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With