how do I filter out a series of data (in pandas dataFrame) where I do not want the 1st letter to be 'Z', or any other character.
I have the following pandas dataFrame, df, (of which there are > 25,000 rows).
TIME_STAMP Activity Action Quantity EPIC Price Sub-activity Venue
0 2017-08-30 08:00:05.000 Allocation BUY 50 RRS 77.6 CPTY 066
1 2017-08-30 08:00:05.000 Allocation BUY 50 RRS 77.6 CPTY 066
3 2017-08-30 08:00:09.000 Allocation BUY 91 BATS 47.875 CPTY PXINLN
4 2017-08-30 08:00:10.000 Allocation BUY 43 PNN 8.07 CPTY WCAPD
5 2017-08-30 08:00:10.000 Allocation BUY 270 SGE 6.93 CPTY PROBDMAD
I am trying to remove all the rows where the 1st letter of the Venue is 'Z'.
For example, my usual filter code would be something like (filtering out all rows where the Venue = '066'
df = df[df.Venue != '066']
I can see this filter line filters out what I need by array, but I am not sure how to specify it within a filter context.
[k for k in df.Venue if 'Z' not in k]
Use str[0]
for select first value or use startswith
, contains
with regex ^
for start of string. For invertong boolen mask is used ~
:
df1 = df[df.Venue.str[0] != 'Z']
df1 = df[~df.Venue.str.startswith('Z')]
df1 = df[~df.Venue.str.contains('^Z')]
If no NaN
s values faster is use list comprehension:
df1 = df[[x[0] != 'Z' for x in df.Venue]]
df1 = df[[not x.startswith('Z') for x in df.Venue]]
For the case where you do not have NaN
values, you can convert the NumPy representation of a series to type '<U1'
and test equality:
df1 = df[df['A'].values.astype('<U1') != 'Z']
from string import ascii_uppercase
from random import choice
L = [''.join(choice(ascii_uppercase) for _ in range(10)) for i in range(100000)]
df = pd.DataFrame({'A': L})
%timeit df['A'].values.astype('<U1') != 'Z' # 4.05 ms per loop
%timeit [x[0] != 'Z' for x in df['A']] # 11.9 ms per loop
%timeit [not x.startswith('Z') for x in df['A']] # 23.7 ms per loop
%timeit ~df['A'].str.startswith('Z') # 53.6 ms per loop
%timeit df['A'].str[0] != 'Z' # 53.7 ms per loop
%timeit ~df['A'].str.contains('^Z') # 127 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With