I'm relatively new to Python & pandas and am struggling with (hierachical) indexes. I've got the basics covered, but am lost with more advanced slicing and cross-sectioning.
For example, with the following dataframe
import pandas as pd
import numpy as np
data = pd.DataFrame(np.arange(9).reshape((3, 3)),
index=pd.Index(['Ohio', 'Colorado', 'New York'], name='state'), columns=pd.Index(['one', 'two', 'three'], name='number'))
I want to select everything except the row with index 'Colorado'. For a small dataset I could do:
data.ix[['Ohio','New York']]
But if the number of unique index values is large, that's impractical. Naively, I would expect a syntax like
data.ix[['state' != 'Colorado']]
However, this only returns the first record 'Ohio' and doesn't return 'New York'. This works, but is cumbersome
filter = list(set(data.index.get_level_values(0).unique()) - set(['Colorado']))
data[filter]
Surely there's a more Pythonic, verbose way of doing this?
This is a Python issue, not a pandas
one: 'state' != 'Colorado'
is True, so what pandas
gets is data.ix[[True]]
.
You could do
>>> data.loc[data.index != "Colorado"]
number one two three
state
Ohio 0 1 2
New York 6 7 8
[2 rows x 3 columns]
or use DataFrame.query
:
>>> data.query("state != 'New York'")
number one two three
state
Ohio 0 1 2
Colorado 3 4 5
[2 rows x 3 columns]
if you don't like the duplication of data
. (Quoting the expression passed to the .query()
method is one of the only ways around the fact that otherwise Python would evaluate the comparison before pandas
ever saw it.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With