Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas pivot table to data frame [duplicate]

I have a dataframe (df) that looks like this:

+---------+-------+------------+----------+ | subject | pills |    date    | strength | +---------+-------+------------+----------+ |       1 |     4 | 10/10/2012 |      250 | |       1 |     4 | 10/11/2012 |      250 | |       1 |     2 | 10/12/2012 |      500 | |       2 |     1 | 1/6/2014   |     1000 | |       2 |     1 | 1/7/2014   |      250 | |       2 |     1 | 1/7/2014   |      500 | |       2 |     3 | 1/8/2014   |      250 | +---------+-------+------------+----------+ 

When I use reshape in R, I get what I want:

reshape(df, idvar = c("subject","date"), timevar = 'strength', direction = "wide")  +---------+------------+--------------+--------------+---------------+ | subject |    date    | strength.250 | strength.500 | strength.1000 | +---------+------------+--------------+--------------+---------------+ |       1 | 10/10/2012 | 4            | NA           | NA            | |       1 | 10/11/2012 | 4            | NA           | NA            | |       1 | 10/12/2012 | NA           | 2            | NA            | |       2 | 1/6/2014   | NA           | NA           | 1             | |       2 | 1/7/2014   | 1            | 1            | NA            | |       2 | 1/8/2014   | 3            | NA           | NA            | +---------+------------+--------------+--------------+---------------+ 

Using pandas:

df.pivot_table(df, index=['subject','date'],columns='strength')  +---------+------------+-------+----+-----+ |         |            | pills            | +---------+------------+-------+----+-----+ |         | strength   | 250   | 500| 1000| +---------+------------+-------+----+-----+ | subject | date       |       |    |     | +---------+------------+-------+----+-----+ | 1       | 10/10/2012 | 4     | NA | NA  | |         | 10/11/2012 | 4     | NA | NA  | |         | 10/12/2012 | NA    | 2  | NA  | +---------+------------+-------+----+-----+ | 2       | 1/6/2014   | NA    | NA | 1   | |         | 1/7/2014   | 1     | 1  | NA  | |         | 1/8/2014   | 3     | NA | NA  | +---------+------------+-------+----+-----+ 

How do I get exactly the same output as in R with pandas? I only want 1 header.

like image 573
alma123 Avatar asked Mar 10 '17 00:03

alma123


People also ask

How do I convert a pivot table into a DataFrame?

DataFrame - pivot_table() function The pivot_table() function is used to create a spreadsheet-style pivot table as a DataFrame. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.

How do I get rid of duplicates in pandas DataFrame?

Remove All Duplicate Rows from Pandas DataFrame You can set 'keep=False' in the drop_duplicates() function to remove all the duplicate rows. For E.x, df. drop_duplicates(keep=False) .

What is the flatten method in pandas?

Return a copy of the array collapsed into one dimension. Whether to flatten in C (row-major), Fortran (column-major) order, or preserve the C/Fortran ordering from a . The default is 'C'.


Video Answer


1 Answers

After pivoting, convert the dataframe to records and then back to dataframe:

flattened = pd.DataFrame(pivoted.to_records()) #   subject        date  ('pills', 250)  ('pills', 500)  ('pills', 1000) #0        1  10/10/2012             4.0             NaN              NaN #1        1  10/11/2012             4.0             NaN              NaN #2        1  10/12/2012             NaN             2.0              NaN #3        2    1/6/2014             NaN             NaN              1.0 #4        2    1/7/2014             1.0             1.0              NaN #5        2    1/8/2014             3.0             NaN              NaN 

You can now "repair" the column names, if you want:

flattened.columns = [hdr.replace("('pills', ", "strength.").replace(")", "") \                      for hdr in flattened.columns] flattened #   subject        date  strength.250  strength.500  strength.1000 #0        1  10/10/2012           4.0           NaN            NaN #1        1  10/11/2012           4.0           NaN            NaN #2        1  10/12/2012           NaN           2.0            NaN #3        2    1/6/2014           NaN           NaN            1.0 #4        2    1/7/2014           1.0           1.0            NaN #5        2    1/8/2014           3.0           NaN            NaN 

It's awkward, but it works.

like image 173
DYZ Avatar answered Sep 20 '22 19:09

DYZ