Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - Groupby and create new DataFrame?

This is my situation -

In[1]: data
Out[1]: 
     Item                    Type
0  Orange           Edible, Fruit
1  Banana           Edible, Fruit
2  Tomato       Edible, Vegetable
3  Laptop  Non Edible, Electronic

In[2]: type(data)
Out[2]: pandas.core.frame.DataFrame

What I want to do is create a data frame of only Fruits, so I need to groupby such a way that Fruit exists in Type.

I've tried doing this:

grouped = data.groupby(lambda x: "Fruit" in x, axis=1)

I don't know if that's the way of doing it, I'm having a little tough time understanding groupby. How do I get a new DataFrame of only Fruits?

like image 249
ComputerFellow Avatar asked Jan 06 '14 14:01

ComputerFellow


People also ask

How do I create a new DataFrame in Groupby?

Step 1: split the data into groups by creating a groupby object from the original DataFrame; Step 2: apply a function, in this case, an aggregation function that computes a summary statistic (you can also transform or filter your data in this step); Step 3: combine the results into a new DataFrame.

What does PD Groupby return?

Returns a groupby object that contains information about the groups. Convenience method for frequency conversion and resampling of time series. See the user guide for more detailed usage and examples, including splitting an object into groups, iterating through groups, selecting a group, aggregation, and more.

What are the three phases of the pandas Groupby () function?

The “group by” process: split-apply-combine (1) Splitting the data into groups. (2). Applying a function to each group independently, (3) Combining the results into a data structure.


2 Answers

You could use

data[data['Type'].str.contains('Fruit')]

import pandas as pd

data = pd.DataFrame({'Item':['Orange', 'Banana', 'Tomato', 'Laptop'],
                     'Type':['Edible, Fruit', 'Edible, Fruit', 'Edible, Vegetable', 'Non Edible, Electronic']})
print(data[data['Type'].str.contains('Fruit')])

yields

     Item           Type
0  Orange  Edible, Fruit
1  Banana  Edible, Fruit
like image 52
unutbu Avatar answered Sep 23 '22 18:09

unutbu


groupby does something else entirely. It creates groups for aggregation. Basically, it goes from something like:

['a', 'b', 'a', 'c', 'b', 'b']

to something like:

[['a', 'a'], ['b', 'b', 'b'], ['c']]

What you want is df.apply.

In newer versions of pandas there's a query method that makes this a bit more efficient and easier.

However, one what of doing what you want is to make a boolean array by using

mask = df.Type.apply(lambda x: 'Fruit' in x)

And then selecting the relevant portions of the data frame with df[mask]. Or, as a one-liner:

df[df.Type.apply(lambda x: 'Fruit' in x)]

As a full example:

import pandas as pd
data = [['Orange', 'Edible, Fruit'],
        ['Banana', 'Edible, Fruit'],
        ['Tomato', 'Edible, Vegtable'],
        ['Laptop', 'Non Edible, Electronic']]
df = pd.DataFrame(data, columns=['Item', 'Type'])

print df[df.Type.apply(lambda x: 'Fruit' in x)]
like image 28
Joe Kington Avatar answered Sep 24 '22 18:09

Joe Kington