I have a Pandas DataFrame like this one:
import numpy as np
import pandas as pd
np.random.seed(1234)
midx = pd.MultiIndex.from_product([['a', 'b', 'c'], pd.date_range('20130101', periods=6)], names=['letter', 'date'])
df = pd.DataFrame(np.random.randn(len(midx), 1), index=midx)
That dataframe looks like this:
0
letter date
a 2013-01-01 0.471435
2013-01-02 -1.190976
2013-01-03 1.432707
2013-01-04 -0.312652
2013-01-05 -0.720589
2013-01-06 0.887163
b 2013-01-01 0.859588
2013-01-02 -0.636524
2013-01-03 0.015696
2013-01-04 -2.242685
2013-01-05 1.150036
2013-01-06 0.991946
c 2013-01-01 0.953324
2013-01-02 -2.021255
2013-01-03 -0.334077
2013-01-04 0.002118
2013-01-05 0.405453
2013-01-06 0.289092
What I want to do is to keep all rows based on a condition on date which depends on the letter. For instance,
All this information could be stored in a dictionary for instance.
dictionary = {"a": slice("20130102", "20130105"),
"b": "20130103",
"c": slice("20130103", "20130105")}
Is there an easy way to compute this with pandas? I did not find any information about such filtering.
Using Loc to Filter With Multiple Conditions The loc function in pandas can be used to access groups of rows or columns by label. Add each condition you want to be included in the filtered result and concatenate them with the & operator. You'll see our code sample will return a pd. dataframe of our filtered rows.
The MultiIndex object is the hierarchical analogue of the standard Index object which typically stores the axis labels in pandas objects. You can think of MultiIndex as an array of tuples where each tuple is unique. A MultiIndex can be created from a list of arrays (using MultiIndex.
You could use query
, it is designed for this kind of selection criteria.
If you slightly modify your dictionary
you can generate your desired query with the help of a list comprehension:
In : dictionary
Out:
{'a': ('20130102', '20130105'),
'b': ('20130103', '20130103'),
'c': ('20130103', '20130105')}
In : df.query(
' or '.join("('{}' <= date <= '{}' and letter == '{}')".format(*(v + (k,)))
for k, v in dictionary.items())
)
Out:
0
letter date
a 2013-01-02 -1.190976
2013-01-03 1.432707
2013-01-04 -0.312652
2013-01-05 -0.720589
b 2013-01-03 0.015696
c 2013-01-03 -0.334077
2013-01-04 0.002118
2013-01-05 0.405453
For more information on what the query statement is actually doing, here's details on the list comprehension:
In : (' or '.join("('{}' <= date <= '{}' and letter == '{}')".format(*(v + (k,)))
for k, v in dictionary.items()))
Out: "('20130102' <= date <= '20130105' and letter == 'a') or
('20130103' <= date <= '20130105' and letter == 'c') or
('20130103' <= date <= '20130103' and letter == 'b')"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With