I have a pandas DataFrame as follows:
mail = DataFrame({'mail' : ['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]']})
that looks like:
                    mail
0          [email protected]
1        [email protected]
2       [email protected]
3   [email protected]
4  [email protected]
5  [email protected]
6       [email protected]
What I want to do is to filter out (elimiante) all those rows in which the value in the column mail ends with '@gmail.com'.
You can use str.endswith and negate the result of the boolean Series with ~:
mail[~mail['mail'].str.endswith('@gmail.com')]
Which produces:
                    mail
2       [email protected]
3   [email protected]
4  [email protected]
5  [email protected]
6       [email protected]
Pandas has many other vectorised string operations which are accessible through the .str accessor. Many of these are instantly familiar from Python's own string methods, but come will built in handling of NaN values.
A column with type str has a field .str on it, using which you can access the standard functions defined for a single str:
[6]: mail['mail'].str.endswith('gmail.com')
      Out[6]:
0     True
1     True
2    False
3    False
4    False
5    False
6    False
Name: mail, dtype: bool
Then you can filter using this Series:
[7]: mail[~mail['mail'].str.endswith('gmail.com')]
      Out[7]:
                    mail
2       [email protected]
3   [email protected]
4  [email protected]
5  [email protected]
6       [email protected]
A similar property .dt exists for accessing date/time related properties of a column if it contains date-data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With