I have a data frame like this in pandas:
column1 column2 [a,b,c] 1 [d,e,f] 2 [g,h,i] 3
column1 column2 a 1 b 1 c 1 d 2 e 2 f 2 g 3 h 3 i 3
How to process this data ?
To split a pandas column of lists into multiple columns, create a new dataframe by applying the tolist() function to the column. The following is the syntax. You can also pass the names of new columns resulting from the split as a list.
To expand the output display to see more columns of a Python Pandas DataFrame, we call the set_option method. to call set_option to set the values for max_rows , max_columns with width of each row.
Pandas series can be converted to a list using tolist() or type casting method. There can be situations when you want to perform operations on a list instead of a pandas object. In such cases, you can store the DataFrame columns in a list and perform the required operations.
DataFrame.explode
Since pandas >= 0.25.0
we have the explode
method for this, which expands a list to a row for each element and repeats the rest of the columns:
df.explode('column1').reset_index(drop=True)
Output
column1 column2 0 a 1 1 b 1 2 c 1 3 d 2 4 e 2 5 f 2 6 g 3 7 h 3 8 i 3
Since pandas >= 1.1.0
we have the ignore_index
argument, so we don't have to chain with reset_index
:
df.explode('column1', ignore_index=True)
Output
column1 column2 0 a 1 1 b 1 2 c 1 3 d 2 4 e 2 5 f 2 6 g 3 7 h 3 8 i 3
You can create DataFrame
by its constructor and stack
:
df2 = pd.DataFrame(df.column1.tolist(), index=df.column2) .stack() .reset_index(level=1, drop=True) .reset_index(name='column1')[['column1','column2']] print (df2) column1 column2 0 a 1 1 b 1 2 c 1 3 d 2 4 e 2 5 f 2 6 g 3 7 h 3 8 i 3
If need change ordering by subset [['column1','column2']]
, you can also omit first reset_index
:
df2 = pd.DataFrame(df.column1.tolist(), index=df.column2) .stack() .reset_index(name='column1')[['column1','column2']] print (df2) column1 column2 0 a 1 1 b 1 2 c 1 3 d 2 4 e 2 5 f 2 6 g 3 7 h 3 8 i 3
Another solution DataFrame.from_records
for creating DataFrame
from first column, then create Series
by stack
and join
to original DataFrame
:
df = pd.DataFrame({'column1': [['a','b','c'],['d','e','f'],['g','h','i']], 'column2':[1,2,3]}) a = pd.DataFrame.from_records(df.column1.tolist()) .stack() .reset_index(level=1, drop=True) .rename('column1') print (a) 0 a 0 b 0 c 1 d 1 e 1 f 2 g 2 h 2 i Name: column1, dtype: object print (df.drop('column1', axis=1) .join(a) .reset_index(drop=True)[['column1','column2']]) column1 column2 0 a 1 1 b 1 2 c 1 3 d 2 4 e 2 5 f 2 6 g 3 7 h 3 8 i 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With