Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas expand rows from list data available in column

I have a data frame like this in pandas:

 column1      column2  [a,b,c]        1  [d,e,f]        2  [g,h,i]        3 

Expected output:

column1      column2   a              1   b              1   c              1   d              2   e              2   f              2   g              3   h              3   i              3 

How to process this data ?

like image 446
Sanjay Yadav Avatar asked Aug 18 '16 06:08

Sanjay Yadav


People also ask

How do you split a list inside a DataFrame cell into columns in pandas?

To split a pandas column of lists into multiple columns, create a new dataframe by applying the tolist() function to the column. The following is the syntax. You can also pass the names of new columns resulting from the split as a list.

How do I expand the output display to see more rows of A pandas DataFrame?

To expand the output display to see more columns of a Python Pandas DataFrame, we call the set_option method. to call set_option to set the values for max_rows , max_columns with width of each row.

What is Tolist () in pandas?

Pandas series can be converted to a list using tolist() or type casting method. There can be situations when you want to perform operations on a list instead of a pandas object. In such cases, you can store the DataFrame columns in a list and perform the required operations.


2 Answers

DataFrame.explode

Since pandas >= 0.25.0 we have the explode method for this, which expands a list to a row for each element and repeats the rest of the columns:

df.explode('column1').reset_index(drop=True) 

Output

   column1  column2 0       a        1 1       b        1 2       c        1 3       d        2 4       e        2 5       f        2 6       g        3 7       h        3 8       i        3 

Since pandas >= 1.1.0 we have the ignore_index argument, so we don't have to chain with reset_index:

df.explode('column1', ignore_index=True) 

Output

  column1  column2 0       a        1 1       b        1 2       c        1 3       d        2 4       e        2 5       f        2 6       g        3 7       h        3 8       i        3 
like image 183
Erfan Avatar answered Sep 23 '22 03:09

Erfan


You can create DataFrame by its constructor and stack:

 df2 = pd.DataFrame(df.column1.tolist(), index=df.column2)         .stack()         .reset_index(level=1, drop=True)         .reset_index(name='column1')[['column1','column2']] print (df2)    column1  column2 0       a        1 1       b        1 2       c        1 3       d        2 4       e        2 5       f        2 6       g        3 7       h        3 8       i        3 

If need change ordering by subset [['column1','column2']], you can also omit first reset_index:

df2 = pd.DataFrame(df.column1.tolist(), index=df.column2)         .stack()         .reset_index(name='column1')[['column1','column2']] print (df2)   column1  column2 0       a        1 1       b        1 2       c        1 3       d        2 4       e        2 5       f        2 6       g        3 7       h        3 8       i        3 

Another solution DataFrame.from_records for creating DataFrame from first column, then create Series by stack and join to original DataFrame:

df = pd.DataFrame({'column1': [['a','b','c'],['d','e','f'],['g','h','i']],                    'column2':[1,2,3]})   a = pd.DataFrame.from_records(df.column1.tolist())                 .stack()                 .reset_index(level=1, drop=True)                 .rename('column1')  print (a) 0    a 0    b 0    c 1    d 1    e 1    f 2    g 2    h 2    i Name: column1, dtype: object  print (df.drop('column1', axis=1)          .join(a)          .reset_index(drop=True)[['column1','column2']])    column1  column2 0       a        1 1       b        1 2       c        1 3       d        2 4       e        2 5       f        2 6       g        3 7       h        3 8       i        3 
like image 28
jezrael Avatar answered Sep 21 '22 03:09

jezrael