Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Expand pandas DataFrame column into multiple rows

Tags:

If I have a DataFrame such that:

pd.DataFrame( {"name" : "John",                 "days" : [[1, 3, 5, 7]]               }) 

gives this structure:

           days  name 0  [1, 3, 5, 7]  John 

How do expand it to the following?

   days  name 0     1  John 1     3  John 2     5  John 3     7  John 
like image 754
gozzilli Avatar asked Jul 05 '16 12:07

gozzilli


Video Answer


2 Answers

You could use df.itertuples to iterate through each row, and use a list comprehension to reshape the data into the desired form:

import pandas as pd  df = pd.DataFrame( {"name" : ["John", "Eric"],                 "days" : [[1, 3, 5, 7], [2,4]]}) result = pd.DataFrame([(d, tup.name) for tup in df.itertuples() for d in tup.days]) print(result) 

yields

   0     1 0  1  John 1  3  John 2  5  John 3  7  John 4  2  Eric 5  4  Eric 

Divakar's solution, using_repeat, is fastest:

In [48]: %timeit using_repeat(df) 1000 loops, best of 3: 834 µs per loop  In [5]: %timeit using_itertuples(df) 100 loops, best of 3: 3.43 ms per loop  In [7]: %timeit using_apply(df) 1 loop, best of 3: 379 ms per loop  In [8]: %timeit using_append(df) 1 loop, best of 3: 3.59 s per loop 

Here is the setup used for the above benchmark:

import numpy as np import pandas as pd  N = 10**3 df = pd.DataFrame( {"name" : np.random.choice(list('ABCD'), size=N),                      "days" : [np.random.randint(10, size=np.random.randint(5))                               for i in range(N)]})  def using_itertuples(df):     return  pd.DataFrame([(d, tup.name) for tup in df.itertuples() for d in tup.days])  def using_repeat(df):     lens = [len(item) for item in df['days']]     return pd.DataFrame( {"name" : np.repeat(df['name'].values,lens),                            "days" : np.concatenate(df['days'].values)})  def using_apply(df):     return (df.apply(lambda x: pd.Series(x.days), axis=1)             .stack()             .reset_index(level=1, drop=1)             .to_frame('day')             .join(df['name']))  def using_append(df):     df2 = pd.DataFrame(columns = df.columns)     for i,r in df.iterrows():         for e in r.days:             new_r = r.copy()             new_r.days = e             df2 = df2.append(new_r)     return df2 
like image 122
unutbu Avatar answered Sep 23 '22 15:09

unutbu


New since pandas 0.25 you can use the function explode()

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html

import pandas as pd df = pd.DataFrame( {"name" : "John",                 "days" : [[1, 3, 5, 7]]})  print(df.explode('days')) 

prints

   name days 0  John    1 0  John    3 0  John    5 0  John    7 
like image 25
philshem Avatar answered Sep 20 '22 15:09

philshem