Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Groupby, transpose and append in Pandas?

I have a dataframe which looks like this:

enter image description here

Each user has 10 records. Now, I want to create a dataframe which looks like this:

userid  name1  name2  ... name10

which means I need to invert every 10 records of the column name and append to a new dataframe.

So, how do it do it? Is there any way I can do it in Pandas?

like image 380
Dawny33 Avatar asked Jul 14 '16 08:07

Dawny33


People also ask

How do I merge Groupby in pandas?

Step 1: split the data into groups by creating a groupby object from the original DataFrame; Step 2: apply a function, in this case, an aggregation function that computes a summary statistic (you can also transform or filter your data in this step); Step 3: combine the results into a new DataFrame.

How does the Groupby () method works in pandas?

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups. Used to determine the groups for the groupby.

What is the difference between Groupby and Pivot_table in pandas?

What is the difference between the pivot_table and the groupby? The groupby method is generally enough for two-dimensional operations, but pivot_table is used for multi-dimensional grouping operations.

What does .first do in Groupby?

Pandas dataframe has groupby([column(s)]). first() method which is used to get the first record from each group. The result of grouby.


2 Answers

groupby('userid') then reset_index within each group to enumerate consistently across groups. Then unstack to get columns.

df.groupby('userid')['name'].apply(lambda df: df.reset_index(drop=True)).unstack()

Demonstration

df = pd.DataFrame([
        [123, 'abc'],
        [123, 'abc'],
        [456, 'def'],
        [123, 'abc'],
        [123, 'abc'],
        [456, 'def'],
        [456, 'def'],
        [456, 'def'],
    ], columns=['userid', 'name'])

df.sort_values('userid').groupby('userid')['name'].apply(lambda df: df.reset_index(drop=True)).unstack()

enter image description here

if you don't want the userid as the index, add reset_index to the end.

df.sort_values('userid').groupby('userid')['name'].apply(lambda df: df.reset_index(drop=True)).unstack().reset_index()

enter image description here

like image 191
piRSquared Avatar answered Sep 19 '22 09:09

piRSquared


You may also be interested in pandas.DataFrame.pivot

See this example dataframe:

df
    userid  name  values
0   123     A     1    
1   123     B     2    
2   123     C     3    
3   456     A     4    
4   456     B     5    
5   456     C     6    

using df.pivot

df.pivot(index='userid', columns='name', values='values')
name  A   B   C
userid
123  1   2   3
456  4   5   6
like image 30
Jan_ewazz Avatar answered Sep 18 '22 09:09

Jan_ewazz