This is my situation - <pre class="prettyprint"><code>In[1]: data Out[1]: Item Type 0 Orange Edible, Fruit 1 Banana Edible, Fruit 2 Tomato Edible, Vegetable 3 Laptop Non Edible, Electronic In[2]: type(data) Out[2]: pandas.core.frame.DataFrame </code></pre> What I want to do is create a data frame of only <code>Fruits</code>, so I need to <code>groupby</code> such a way that <code>Fruit</code> exists in <code>Type</code>. I've tried doing this: <code>grouped = data.groupby(lambda x: "Fruit" in x, axis=1)</code> I don't know if that's the way of doing it, I'm having a little tough time understanding <code>groupby</code>. How do I get a new <code>DataFrame</code> of only <code>Fruits</code>?

<code>groupby</code> does something else entirely. It creates groups for aggregation. Basically, it goes from something like: <pre class="prettyprint"><code>['a', 'b', 'a', 'c', 'b', 'b'] </code></pre> to something like: <pre class="prettyprint"><code>[['a', 'a'], ['b', 'b', 'b'], ['c']] </code></pre> What you want is <code>df.apply</code>. In newer versions of <code>pandas</code> there's a <code>query</code> method that makes this a bit more efficient and easier. However, one what of doing what you want is to make a boolean array by using <pre class="prettyprint"><code>mask = df.Type.apply(lambda x: 'Fruit' in x) </code></pre> And then selecting the relevant portions of the data frame with <code>df[mask]</code>. Or, as a one-liner: <pre class="prettyprint"><code>df[df.Type.apply(lambda x: 'Fruit' in x)] </code></pre> As a full example: <pre class="prettyprint"><code>import pandas as pd data = [['Orange', 'Edible, Fruit'], ['Banana', 'Edible, Fruit'], ['Tomato', 'Edible, Vegtable'], ['Laptop', 'Non Edible, Electronic']] df = pd.DataFrame(data, columns=['Item', 'Type']) print df[df.Type.apply(lambda x: 'Fruit' in x)] </code></pre>

Pandas - Groupby and create new DataFrame?

This is my situation -

In[1]: data
Out[1]: 
     Item                    Type
0  Orange           Edible, Fruit
1  Banana           Edible, Fruit
2  Tomato       Edible, Vegetable
3  Laptop  Non Edible, Electronic

In[2]: type(data)
Out[2]: pandas.core.frame.DataFrame

What I want to do is create a data frame of only Fruits, so I need to groupby such a way that Fruit exists in Type.

I've tried doing this:

grouped = data.groupby(lambda x: "Fruit" in x, axis=1)

I don't know if that's the way of doing it, I'm having a little tough time understanding groupby. How do I get a new DataFrame of only Fruits?

How do I create a new DataFrame in Groupby?

Step 1: split the data into groups by creating a groupby object from the original DataFrame; Step 2: apply a function, in this case, an aggregation function that computes a summary statistic (you can also transform or filter your data in this step); Step 3: combine the results into a new DataFrame.

What does PD Groupby return?

Returns a groupby object that contains information about the groups. Convenience method for frequency conversion and resampling of time series. See the user guide for more detailed usage and examples, including splitting an object into groups, iterating through groups, selecting a group, aggregation, and more.

What are the three phases of the pandas Groupby () function?

The “group by” process: split-apply-combine (1) Splitting the data into groups. (2). Applying a function to each group independently, (3) Combining the results into a data structure.

You could use

data[data['Type'].str.contains('Fruit')]

import pandas as pd

data = pd.DataFrame({'Item':['Orange', 'Banana', 'Tomato', 'Laptop'],
                     'Type':['Edible, Fruit', 'Edible, Fruit', 'Edible, Vegetable', 'Non Edible, Electronic']})
print(data[data['Type'].str.contains('Fruit')])

yields

     Item           Type
0  Orange  Edible, Fruit
1  Banana  Edible, Fruit

groupby does something else entirely. It creates groups for aggregation. Basically, it goes from something like:

['a', 'b', 'a', 'c', 'b', 'b']

to something like:

[['a', 'a'], ['b', 'b', 'b'], ['c']]

What you want is df.apply.

In newer versions of pandas there's a query method that makes this a bit more efficient and easier.

However, one what of doing what you want is to make a boolean array by using

mask = df.Type.apply(lambda x: 'Fruit' in x)

And then selecting the relevant portions of the data frame with df[mask]. Or, as a one-liner:

df[df.Type.apply(lambda x: 'Fruit' in x)]

As a full example:

import pandas as pd
data = [['Orange', 'Edible, Fruit'],
        ['Banana', 'Edible, Fruit'],
        ['Tomato', 'Edible, Vegtable'],
        ['Laptop', 'Non Edible, Electronic']]
df = pd.DataFrame(data, columns=['Item', 'Type'])

print df[df.Type.apply(lambda x: 'Fruit' in x)]

Pandas - Groupby and create new DataFrame?

Tags:

python

pandas

data-analysis

grouping

ComputerFellow

People also ask

2 Answers

unutbu

Joe Kington

Recent Activity

Donate For Us

Pandas - Groupby and create new DataFrame?

Tags:

python

pandas

data-analysis

grouping

ComputerFellow

People also ask

2 Answers

unutbu

Joe Kington

Related questions

Recent Activity

Donate For Us