I have a Datset that looks like :
data="""cruiseid year station month day date lat lon depth_w taxon count
AA8704 1987 1 04 13 13-APR-87 35.85 -75.48 18 Centropages_typicus 75343
AA8704 1987 1 04 13 13-APR-87 35.85 -75.48 18 Gastropoda 0
AA8704 1987 1 04 13 13-APR-87 35.85 -75.48 18 Calanus_finmarchicus 2340
AA8704 1987 1 07 13 13-JUL-87 35.85 -75.48 18 Acartia_spp. 5616
AA8704 1987 1 07 13 13-JUL-87 35.85 -75.48 18 Metridia_lucens 468
AA8704 1987 1 08 13 13-AUG-87 35.85 -75.48 18 Evadne_spp. 0
AA8704 1987 1 08 13 13-AUG-87 35.85 -75.48 18 Salpa 0
AA8704 1987 1 08 13 13-AUG-87 35.85 -75.48 18 Oithona_spp. 468
"""
datafile = open('data.txt','w')
datafile.write(data)
datafile.close()
I read it into pandas with :
parse = lambda x: dt.datetime.strptime(x, '%d-%m-%Y')
df = pd.read_csv('data.txt',index_col=0, header=False, parse_dates={"Datetime" : [1,3,4]}, skipinitialspace=True, sep=' ', skiprows=0)
How can i generate a subset from this dataframe with all the records in April where the taxon is 'Calanus_finmarchicus' or 'Gastropoda'
I can query the dataframe where taxon is equal to 'Calanus_finmarchicus' or 'Gastropoda' using
df[(df.taxon == 'Calanus_finmarchicus') | (df.taxon == 'Gastropoda')]
But i'm in trouble quering the time, something similar in numy can be like :
import numpy as np
data = np.genfromtxt('data.txt', dtype=[('cruiseid','S6'), ('year','i4'), ('station','i4'), ('month','i4'), ('day','i4'), ('date','S9'), ('lat','f8'), ('lon','f8'), ('depth_w','i8'), ('taxon','S60'), ('count','i8')], skip_header=1)
selection = [np.where((data['taxon']=='Calanus_finmarchicus') | (data['taxon']=='Gastropoda') & ((data['month']==4) | (data['month']==3)))[0]]
data[selection]
Here's a link with a notebook to reproduce the example
You can refer to datetime
's month
attribute:
>>> df.index.month
array([4, 4, 4, 7, 7, 8, 8, 8], dtype=int32)
>>> df[((df.taxon == 'Calanus_finmarchicus') | (df.taxon == 'Gastropoda'))
... & (df.index.month == 4)]
cruiseid station date lat lon depth_w \
Datetime
1987-04-13 AA8704 1 13-APR-87 35.85 -75.48 18
1987-04-13 AA8704 1 13-APR-87 35.85 -75.48 18
taxon count Unnamed: 11
Datetime
1987-04-13 Gastropoda 0 NaN
1987-04-13 Calanus_finmarchicus 2340 NaN
As others said, you can use df.index.month
to filter by month, but I also suggest to use pandas.Series.isin()
to check your taxon
condition:
>>> df[df.taxon.isin(['Calanus_finmarchicus', 'Gastropoda']) & (df.index.month == 4)]
cruiseid station date lat lon depth_w \
Datetime
1987-04-13 AA8704 1 13-APR-87 35.85 -75.48 18
1987-04-13 AA8704 1 13-APR-87 35.85 -75.48 18
taxon count Unnamed: 11
Datetime
1987-04-13 Gastropoda 0 NaN
1987-04-13 Calanus_finmarchicus 2340 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With