I've some data that I group by occurrence in one second intervals. I'm having issues finding the correct way to filter out counts below a certain threshold, e.g. I would not want to show anything below a count of 100. I've tried various versions of filter / lambda constructs but I was not able to filter before or after the .count() method is called.
df = pd.DataFrame({
'Date': timestamps,
'Path': paths,
})
y = df.groupby([pd.Grouper(freq='1s',key='Date'), 'Path'])
print(y.count())
The output currently resembles this:
Date Path
2015-12-26 06:08:27 rawdata/file3 1
rawdata/file2 118
2015-12-26 06:08:28 rawdata/file1 85
rawdata/file2 9796
I would like it to look like this:
Date Path
2015-12-26 06:08:27 rawdata/file2 118
2015-12-26 06:08:28 rawdata/file2 9796
You can try rename column Date to Count and then subset of rows, where column Count is < 3 (you can change it to value 100):
print df
Date Path
0 2015-12-26 06:08:27 rawdata/file3
1 2015-12-26 06:08:27 rawdata/file2
2 2015-12-26 06:08:27 rawdata/file2
3 2015-12-26 06:08:27 rawdata/file2
4 2015-12-26 06:08:27 rawdata/file2
5 2015-12-26 06:08:27 rawdata/file2
6 2015-12-26 06:08:27 rawdata/file2
7 2015-12-26 06:08:28 rawdata/file1
8 2015-12-26 06:08:28 rawdata/file1
9 2015-12-26 06:08:28 rawdata/file1
10 2015-12-26 06:08:28 rawdata/file1
11 2015-12-26 06:08:28 rawdata/file1
12 2015-12-26 06:08:28 rawdata/file2
y = df.groupby([pd.Grouper(freq='1s', key='Date'), 'Path']).count().rename(columns={'Date':'Count'})
print(y)
Count
Date Path
2015-12-26 06:08:27 rawdata/file2 6
rawdata/file3 1
2015-12-26 06:08:28 rawdata/file1 5
rawdata/file2 1
print y[y.Count < 3]
Count
Date Path
2015-12-26 06:08:27 rawdata/file3 1
2015-12-26 06:08:28 rawdata/file2 1
Next approach is set column name:
y = df.groupby([pd.Grouper(freq='1s', key='Date'), 'Path']).count()
print(y)
y.columns = ['Count']
print y[y.Count < 3]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With