Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: split column of lists of unequal length into multiple columns

Tags:

python

pandas

I have a Pandas dataframe that looks like the below:

                   codes
1                  [71020]
2                  [77085]
3                  [36415]
4                  [99213, 99287]
5                  [99233, 99233, 99233]

I'm trying to split the lists in df['codes'] into columns, like the below:

                   code_1      code_2      code_3   
1                  71020
2                  77085
3                  36415
4                  99213       99287
5                  99233       99233       99233

where columns that don't have a value (because the list was not that long) are filled with blanks or NaNs or something.

I've seen answers like this one and others similar to it, and while they work on lists of equal length, they all throw errors when I try to use the methods on lists of unequal length. Is there a good way do to this?

like image 432
user139014 Avatar asked Jun 20 '17 22:06

user139014


People also ask

How do I split a column into multiple columns in Pandas?

In Pandas, the apply() method can also be used to split one column values into multiple columns. The DataFrame. apply method() can execute a function on all values of single or multiple columns. Then inside that function, we can split the string value to multiple values.

How do I separate columns in Pandas?

split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.

How do I split data in one column into two Pandas?

We can use Pandas' str. split function to split the column of interest. Here we want to split the column “Name” and we can select the column using chain operation and split the column with expand=True option. str.


2 Answers

Try:

pd.DataFrame(df.codes.values.tolist()).add_prefix('code_')     code_0   code_1   code_2 0   71020      NaN      NaN 1   77085      NaN      NaN 2   36415      NaN      NaN 3   99213  99287.0      NaN 4   99233  99233.0  99233.0 

Include the index

pd.DataFrame(df.codes.values.tolist(), df.index).add_prefix('code_')     code_0   code_1   code_2 1   71020      NaN      NaN 2   77085      NaN      NaN 3   36415      NaN      NaN 4   99213  99287.0      NaN 5   99233  99233.0  99233.0 

We can nail down all the formatting with this:

f = lambda x: 'code_{}'.format(x + 1) pd.DataFrame(     df.codes.values.tolist(),     df.index, dtype=object ).fillna('').rename(columns=f)     code_1 code_2 code_3 1   71020               2   77085               3   36415               4   99213  99287        5   99233  99233  99233 
like image 140
piRSquared Avatar answered Oct 14 '22 10:10

piRSquared


Another solution:

In [95]: df.codes.apply(pd.Series).add_prefix('code_') Out[95]:     code_0   code_1   code_2 1  71020.0      NaN      NaN 2  77085.0      NaN      NaN 3  36415.0      NaN      NaN 4  99213.0  99287.0      NaN 5  99233.0  99233.0  99233.0 
like image 28
MaxU - stop WAR against UA Avatar answered Oct 14 '22 09:10

MaxU - stop WAR against UA