I have a data frame like this in pandas: <pre class="prettyprint"><code> column1 column2 [a,b,c] 1 [d,e,f] 2 [g,h,i] 3 </code></pre> <h3>Expected output:</h3> <pre class="prettyprint"><code>column1 column2 a 1 b 1 c 1 d 2 e 2 f 2 g 3 h 3 i 3 </code></pre> How to process this data ?

You can create <code>DataFrame</code> by its constructor and <code>stack</code>: <pre class="prettyprint"><code> df2 = pd.DataFrame(df.column1.tolist(), index=df.column2) .stack() .reset_index(level=1, drop=True) .reset_index(name='column1')[['column1','column2']] print (df2) column1 column2 0 a 1 1 b 1 2 c 1 3 d 2 4 e 2 5 f 2 6 g 3 7 h 3 8 i 3 </code></pre> If need change ordering by subset <code>[['column1','column2']]</code>, you can also omit first <code>reset_index</code>: <pre class="prettyprint"><code>df2 = pd.DataFrame(df.column1.tolist(), index=df.column2) .stack() .reset_index(name='column1')[['column1','column2']] print (df2) column1 column2 0 a 1 1 b 1 2 c 1 3 d 2 4 e 2 5 f 2 6 g 3 7 h 3 8 i 3 </code></pre> Another solution <code>DataFrame.from_records</code> for creating <code>DataFrame</code> from first column, then create <code>Series</code> by <code>stack</code> and <code>join</code> to original <code>DataFrame</code>: <pre class="prettyprint"><code>df = pd.DataFrame({'column1': [['a','b','c'],['d','e','f'],['g','h','i']], 'column2':[1,2,3]}) a = pd.DataFrame.from_records(df.column1.tolist()) .stack() .reset_index(level=1, drop=True) .rename('column1') print (a) 0 a 0 b 0 c 1 d 1 e 1 f 2 g 2 h 2 i Name: column1, dtype: object print (df.drop('column1', axis=1) .join(a) .reset_index(drop=True)[['column1','column2']]) column1 column2 0 a 1 1 b 1 2 c 1 3 d 2 4 e 2 5 f 2 6 g 3 7 h 3 8 i 3 </code></pre>

Pandas expand rows from list data available in column

Tags:

python

list

pandas

dataframe

expand

I have a data frame like this in pandas:

 column1      column2  [a,b,c]        1  [d,e,f]        2  [g,h,i]        3

Expected output:

column1      column2   a              1   b              1   c              1   d              2   e              2   f              2   g              3   h              3   i              3

How to process this data ?

446

asked Aug 18 '16 06:08

Sanjay Yadav

2 Answers

`DataFrame.explode`

Since pandas >= 0.25.0 we have the explode method for this, which expands a list to a row for each element and repeats the rest of the columns:

df.explode('column1').reset_index(drop=True)

Output

   column1  column2 0       a        1 1       b        1 2       c        1 3       d        2 4       e        2 5       f        2 6       g        3 7       h        3 8       i        3

Since pandas >= 1.1.0 we have the ignore_index argument, so we don't have to chain with reset_index:

df.explode('column1', ignore_index=True)

Output

  column1  column2 0       a        1 1       b        1 2       c        1 3       d        2 4       e        2 5       f        2 6       g        3 7       h        3 8       i        3

183

answered Sep 23 '22 03:09

Erfan

You can create DataFrame by its constructor and stack:

 df2 = pd.DataFrame(df.column1.tolist(), index=df.column2)         .stack()         .reset_index(level=1, drop=True)         .reset_index(name='column1')[['column1','column2']] print (df2)    column1  column2 0       a        1 1       b        1 2       c        1 3       d        2 4       e        2 5       f        2 6       g        3 7       h        3 8       i        3

If need change ordering by subset [['column1','column2']], you can also omit first reset_index:

df2 = pd.DataFrame(df.column1.tolist(), index=df.column2)         .stack()         .reset_index(name='column1')[['column1','column2']] print (df2)   column1  column2 0       a        1 1       b        1 2       c        1 3       d        2 4       e        2 5       f        2 6       g        3 7       h        3 8       i        3

Another solution DataFrame.from_records for creating DataFrame from first column, then create Series by stack and join to original DataFrame:

df = pd.DataFrame({'column1': [['a','b','c'],['d','e','f'],['g','h','i']],                    'column2':[1,2,3]})   a = pd.DataFrame.from_records(df.column1.tolist())                 .stack()                 .reset_index(level=1, drop=True)                 .rename('column1')  print (a) 0    a 0    b 0    c 1    d 1    e 1    f 2    g 2    h 2    i Name: column1, dtype: object  print (df.drop('column1', axis=1)          .join(a)          .reset_index(drop=True)[['column1','column2']])    column1  column2 0       a        1 1       b        1 2       c        1 3       d        2 4       e        2 5       f        2 6       g        3 7       h        3 8       i        3

answered Sep 21 '22 03:09

jezrael

Related questions
                            
                                ImportError: No module named 'xlrd'
                            
                                What python libraries can tell me approximate location and time zone given an IP address?
                            
                                Objective-C (cocoa) equivalent to python's endswith/beginswith
                            
                                running a command line containing Pipes and displaying result to STDOUT
                            
                                Python: significance of -u option?
                            
                                return default if pandas dataframe.loc location doesn't exist
                            
                                Get all keys from GroupBy object in Pandas
                            
                                list.extend and list comprehension
                            
                                Does pip handle extras_requires from setuptools/distribute based sources?
                            
                                Django: Check if settings variable is set
                            
                                Why does my python not add current working directory to the path?
                            
                                Sound generation / synthesis with python?
                            
                                How to get the index with the key in a dictionary?
                            
                                Skip multiple iterations in loop
                            
                                Python method for reading keypress?
                            
                                python regular expression "\1"
                            
                                How to limit the size of a comprehension?
                            
                                Styling part of label in legend in matplotlib
                            
                                Flask-Login check if user is authenticated without decorator
                            
                                python dict to numpy structured array

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With