This is my situation -
In[1]: data
Out[1]:
Item Type
0 Orange Edible, Fruit
1 Banana Edible, Fruit
2 Tomato Edible, Vegetable
3 Laptop Non Edible, Electronic
In[2]: type(data)
Out[2]: pandas.core.frame.DataFrame
What I want to do is create a data frame of only Fruits
, so I need to groupby
such a way that Fruit
exists in Type
.
I've tried doing this:
grouped = data.groupby(lambda x: "Fruit" in x, axis=1)
I don't know if that's the way of doing it, I'm having a little tough time understanding groupby
. How do I get a new DataFrame
of only Fruits
?
Step 1: split the data into groups by creating a groupby object from the original DataFrame; Step 2: apply a function, in this case, an aggregation function that computes a summary statistic (you can also transform or filter your data in this step); Step 3: combine the results into a new DataFrame.
Returns a groupby object that contains information about the groups. Convenience method for frequency conversion and resampling of time series. See the user guide for more detailed usage and examples, including splitting an object into groups, iterating through groups, selecting a group, aggregation, and more.
The “group by” process: split-apply-combine (1) Splitting the data into groups. (2). Applying a function to each group independently, (3) Combining the results into a data structure.
You could use
data[data['Type'].str.contains('Fruit')]
import pandas as pd
data = pd.DataFrame({'Item':['Orange', 'Banana', 'Tomato', 'Laptop'],
'Type':['Edible, Fruit', 'Edible, Fruit', 'Edible, Vegetable', 'Non Edible, Electronic']})
print(data[data['Type'].str.contains('Fruit')])
yields
Item Type
0 Orange Edible, Fruit
1 Banana Edible, Fruit
groupby
does something else entirely. It creates groups for aggregation. Basically, it goes from something like:
['a', 'b', 'a', 'c', 'b', 'b']
to something like:
[['a', 'a'], ['b', 'b', 'b'], ['c']]
What you want is df.apply
.
In newer versions of pandas
there's a query
method that makes this a bit more efficient and easier.
However, one what of doing what you want is to make a boolean array by using
mask = df.Type.apply(lambda x: 'Fruit' in x)
And then selecting the relevant portions of the data frame with df[mask]
. Or, as a one-liner:
df[df.Type.apply(lambda x: 'Fruit' in x)]
As a full example:
import pandas as pd
data = [['Orange', 'Edible, Fruit'],
['Banana', 'Edible, Fruit'],
['Tomato', 'Edible, Vegtable'],
['Laptop', 'Non Edible, Electronic']]
df = pd.DataFrame(data, columns=['Item', 'Type'])
print df[df.Type.apply(lambda x: 'Fruit' in x)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With