Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I create a DataFrame slice object piece by piece?

Tags:

python

pandas

I have a DataFrame, and I want to select certain rows and columns from it. I know how to do this using loc. However, I want to be able to specify each criteria individually, rather than in one go.

import numpy as np
import pandas as pd
idx = pd.IndexSlice

index = [np.array(['foo', 'foo', 'qux', 'qux']),
         np.array(['a', 'b', 'a', 'b'])]
columns = ["A",  "B"]
df = pd.DataFrame(np.random.randn(4, 2), index=index, columns=columns)
print df
print df.loc[idx['foo', :], idx['A':'B']]

              A         B
foo a  0.676649 -1.638399
    b -0.417915  0.587260
qux a  0.294555 -0.573041
    b  1.592056  0.237868


              A         B
foo a -0.470195 -0.455713
    b  1.750171 -0.409216

Requirement

I want to be able to achieve the same result with something like the following bit of code, where I specify each criteria one by one. It's also important that I'm able to use a slice_list to allow dynamic behaviour [i.e. the syntax should work whether there are two, three or ten different criteria in the slice_list].

slice_1 = 'foo'
slice_2 = ':'
slice_list = [slice_1, slice_2]

column_slice = "'A':'B'"
print df.loc[idx[slice_list], idx[column_slice]]
like image 870
bluprince13 Avatar asked Mar 23 '17 16:03

bluprince13


People also ask

Can we do slicing in DataFrame?

Slicing a DataFrame in Pandas includes the following steps:Ensure Python is installed (or install ActivePython) Import a dataset. Create a DataFrame. Slice the DataFrame.

How do you split an object in a DataFrame in python?

We can use Pandas . str accessor, it does fast vectorized string operations for Series and Dataframes and returns a string object. Pandas str accessor has number of useful methods and one of them is str. split , it can be used with split to get the desired part of the string.


2 Answers

You can achieve this using the slice built-in function. You can't build slices with strings as ':' is a literal character and not a syntatical one.

slice_1 = 'foo'
slice_2 = slice(None)
column_slice = slice('A', 'B')
df.loc[idx[slice_1, slice_2], idx[column_slice]]
like image 162
Ted Petrou Avatar answered Oct 02 '22 10:10

Ted Petrou


You might have to build your "slice lists" a little differently than you intended, but here's a relatively compact method using df.merge() and df.ix[]:

# Build a "query" dataframe
slice_df = pd.DataFrame(index=[['foo','qux','qux'],['a','a','b']])
# Explicitly name columns
column_slice = ['A','B']

slice_df.merge(df, left_index=True, right_index=True, how='inner').ix[:,column_slice]

Out[]: 
              A         B
foo a  0.442302 -0.949298
qux a  0.425645 -0.233174
    b -0.041416  0.229281

This method also requires you to be explicit about your second index and columns, unfortunately. But computers are great at making long tedious lists for you if you ask nicely.

EDIT - Example of method to dynamically built a slice list that could be used like above.

Here's a function that takes a dataframe and spits out a list that could then be used to create a "query" dataframe to slice the original by. It only works with dataframes with 1 or 2 indices. Let me know if that's an issue.

def make_df_slice_list(df):
    if df.index.nlevels == 1:
        slice_list = []
        # Only one level of index
        for dex in df.index.unique():
            if input("DF index: " + dex + " - Include? Y/N: ") == "Y":
                # Add to slice list
                slice_list.append(dex)
    if df.index.nlevels > 1:
        slice_list = [[] for _ in xrange(df.index.nlevels)]
        # Multi level
        for i in df.index.levels[0]:
            print "DF index:", i, "has subindexes:", [dex for dex in df.ix[i].index]
            sublist = input("Enter a the indexes you'd like as a list: ")
            # if no response, the first entry
            if len(sublist)==0:
                sublist = [df.ix[i].index[0]]
            # Add an entry to the first index list for each sub item passed
            [slice_list[0].append(i) for item in sublist]
            # Add each of the second index list items
            [slice_list[1].append(item) for item in sublist]
    return slice_list

I'm not advising this as a way to communicate with your user, just an example. When you use it you have to pass strings (e.g. "Y" and "N") and lists of string (["a","b"]) and empty lists [] at prompts. Example:

In [115]: slice_list = make_df_slice_list(df)

DF index: foo has subindexes: ['a', 'b']
Enter a the indexes you'd like as a list: []
DF index: qux has subindexes: ['a', 'b']
Enter a the indexes you'd like as a list: ['a','b']

In [116]:slice_list
Out[116]: [['foo', 'qux', 'qux'], ['a', 'a', 'b']]

# Back to my original solution, but now passing the list:
slice_df = pd.DataFrame(index=slice_list)
column_slice = ['A','B']

slice_df.merge(df, left_index=True, right_index=True, how='inner').ix[:,column_slice]
Out[117]: 
              A         B
foo a -0.249547  0.056414
qux a  0.938710 -0.202213
    b  0.329136 -0.465999
like image 30
Jammeth_Q Avatar answered Oct 02 '22 10:10

Jammeth_Q