Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply function to pandas DataFrame that can return multiple rows

I am trying to transform DataFrame, such that some of the rows will be replicated a given number of times. For example:

df = pd.DataFrame({'class': ['A', 'B', 'C'], 'count':[1,0,2]})

  class  count
0     A      1
1     B      0
2     C      2

should be transformed to:

  class 
0     A   
1     C   
2     C 

This is the reverse of aggregation with count function. Is there an easy way to achieve it in pandas (without using for loops or list comprehensions)?

One possibility might be to allow DataFrame.applymap function return multiple rows (akin apply method of GroupBy). However, I do not think it is possible in pandas now.

like image 779
btel Avatar asked Oct 24 '12 13:10

btel


People also ask

How do you return multiple columns from pandas using the apply function?

Return Multiple Columns from pandas apply() You can return a Series from the apply() function that contains the new data. pass axis=1 to the apply() function which applies the function multiply to each row of the DataFrame, Returns a series of multiple columns from pandas apply() function.

Which method in pandas can be used to add multiple rows to a DataFrame?

We can also add multiple rows using the pandas. concat() by creating a new dataframe of all the rows that we need to add and then appending this dataframe to the original dataframe.

Which method is used to return one or more rows in pandas?

Select first N Rows from a Dataframe using head() function In Python's Pandas module, the Dataframe class provides a head() function to fetch top rows from a Dataframe i.e. It returns the first n rows from a dataframe.

How can pandas select rows based on multiple conditions?

You can get pandas. Series of bool which is an AND of two conditions using & . Note that == and ~ are used here as the second condition for the sake of explanation, but you can use !=


1 Answers

You could use groupby:

def f(group):
    row = group.irow(0)
    return DataFrame({'class': [row['class']] * row['count']})
df.groupby('class', group_keys=False).apply(f)

so you get

In [25]: df.groupby('class', group_keys=False).apply(f)
Out[25]: 
  class
0     A
0     C
1     C

You can fix the index of the result however you like

like image 177
Wes McKinney Avatar answered Oct 03 '22 05:10

Wes McKinney