I have a DataFrame, and I want to select certain rows and columns from it. I know how to do this using loc
. However, I want to be able to specify each criteria individually, rather than in one go.
import numpy as np
import pandas as pd
idx = pd.IndexSlice
index = [np.array(['foo', 'foo', 'qux', 'qux']),
np.array(['a', 'b', 'a', 'b'])]
columns = ["A", "B"]
df = pd.DataFrame(np.random.randn(4, 2), index=index, columns=columns)
print df
print df.loc[idx['foo', :], idx['A':'B']]
A B
foo a 0.676649 -1.638399
b -0.417915 0.587260
qux a 0.294555 -0.573041
b 1.592056 0.237868
A B
foo a -0.470195 -0.455713
b 1.750171 -0.409216
Requirement
I want to be able to achieve the same result with something like the following bit of code, where I specify each criteria one by one. It's also important that I'm able to use a slice_list
to allow dynamic behaviour [i.e. the syntax should work whether there are two, three or ten different criteria in the slice_list
].
slice_1 = 'foo'
slice_2 = ':'
slice_list = [slice_1, slice_2]
column_slice = "'A':'B'"
print df.loc[idx[slice_list], idx[column_slice]]
Slicing a DataFrame in Pandas includes the following steps:Ensure Python is installed (or install ActivePython) Import a dataset. Create a DataFrame. Slice the DataFrame.
We can use Pandas . str accessor, it does fast vectorized string operations for Series and Dataframes and returns a string object. Pandas str accessor has number of useful methods and one of them is str. split , it can be used with split to get the desired part of the string.
You can achieve this using the slice
built-in function. You can't build slices with strings as ':' is a literal character and not a syntatical one.
slice_1 = 'foo'
slice_2 = slice(None)
column_slice = slice('A', 'B')
df.loc[idx[slice_1, slice_2], idx[column_slice]]
You might have to build your "slice lists" a little differently than you intended, but here's a relatively compact method using df.merge()
and df.ix[]
:
# Build a "query" dataframe
slice_df = pd.DataFrame(index=[['foo','qux','qux'],['a','a','b']])
# Explicitly name columns
column_slice = ['A','B']
slice_df.merge(df, left_index=True, right_index=True, how='inner').ix[:,column_slice]
Out[]:
A B
foo a 0.442302 -0.949298
qux a 0.425645 -0.233174
b -0.041416 0.229281
This method also requires you to be explicit about your second index and columns, unfortunately. But computers are great at making long tedious lists for you if you ask nicely.
EDIT - Example of method to dynamically built a slice list that could be used like above.
Here's a function that takes a dataframe and spits out a list that could then be used to create a "query" dataframe to slice the original by. It only works with dataframes with 1 or 2 indices. Let me know if that's an issue.
def make_df_slice_list(df):
if df.index.nlevels == 1:
slice_list = []
# Only one level of index
for dex in df.index.unique():
if input("DF index: " + dex + " - Include? Y/N: ") == "Y":
# Add to slice list
slice_list.append(dex)
if df.index.nlevels > 1:
slice_list = [[] for _ in xrange(df.index.nlevels)]
# Multi level
for i in df.index.levels[0]:
print "DF index:", i, "has subindexes:", [dex for dex in df.ix[i].index]
sublist = input("Enter a the indexes you'd like as a list: ")
# if no response, the first entry
if len(sublist)==0:
sublist = [df.ix[i].index[0]]
# Add an entry to the first index list for each sub item passed
[slice_list[0].append(i) for item in sublist]
# Add each of the second index list items
[slice_list[1].append(item) for item in sublist]
return slice_list
I'm not advising this as a way to communicate with your user, just an example. When you use it you have to pass strings (e.g. "Y"
and "N"
) and lists of string (["a","b"]
) and empty lists []
at prompts. Example:
In [115]: slice_list = make_df_slice_list(df)
DF index: foo has subindexes: ['a', 'b']
Enter a the indexes you'd like as a list: []
DF index: qux has subindexes: ['a', 'b']
Enter a the indexes you'd like as a list: ['a','b']
In [116]:slice_list
Out[116]: [['foo', 'qux', 'qux'], ['a', 'a', 'b']]
# Back to my original solution, but now passing the list:
slice_df = pd.DataFrame(index=slice_list)
column_slice = ['A','B']
slice_df.merge(df, left_index=True, right_index=True, how='inner').ix[:,column_slice]
Out[117]:
A B
foo a -0.249547 0.056414
qux a 0.938710 -0.202213
b 0.329136 -0.465999
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With