Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: split list in column into multiple rows [duplicate]

Tags:

python

pandas

I have a question regarding splitting a list in a dataframe column into multiple rows.

Let's say I have this dataframe:

  Job position   Job type  id
0          [6]        [1]   3
1       [2, 6]  [3, 6, 5]   4
2          [1]        [9]  43

I would like every single combination of numbers, so the final result would be:

   id    Job position  Job type
0   3         6.0       1.0
1   4         2.0       3.0
2   4         2.0       6.0
3   4         2.0       5.0
4   4         6.0       3.0
5   4         6.0       6.0
6   4         6.0       5.0
7  43         1.0       9.0

Because right now I get this result:

   id    Job position  Job type
0   3         6.0       1.0
1   4         2.0       3.0
2   4         6.0       6.0
3   4         NaN       5.0
4  43         1.0       9.0

In order to get the result above, I did:

df = df.set_index(['id'])
(df.apply(lambda x: pd.DataFrame(x.tolist(),index=x.index)
                        .stack()
                        .rename(x.name)).reset_index())
like image 414
Mathias Lund Avatar asked May 07 '18 15:05

Mathias Lund


People also ask

How do you split a list inside a DataFrame cell into rows in pandas?

DataFrame - explode() function The explode() function is used to transform each element of a list-like to a row, replicating the index values. Exploded lists to rows of the subset columns; index will be duplicated for these rows. Raises: ValueError - if columns of the frame are not unique.

How do I split a row into multiple rows in pandas DataFrame?

Series and DataFrame methods define a . explode() method that explodes lists into separate rows. See the docs section on Exploding a list-like column. Since you have a list of comma separated strings, split the string on comma to get a list of elements, then call explode on that column.


1 Answers

Use a comprehension

pd.DataFrame([
    [p, t, i] for P, T, i in df.values
    for p in P for t in T
], columns=df.columns)

   Job position  Job type  id
0             6         1   3
1             2         3   4
2             2         6   4
3             2         5   4
4             6         3   4
5             6         6   4
6             6         5   4
7             1         9  43

Alternatives to iterating over values

pd.DataFrame([
    [p, t, i] for P, T, i in df.itertuples(index=False)
    for p in P for t in T
], columns=df.columns)

z = zip(df['Job position'], df['Job type'], df['id'])
pd.DataFrame([
    [p, t, i] for P, T, i in z
    for p in P for t in T
], columns=df.columns)

To generalize this solution to accommodate any number of columns

pd.DataFrame([
    [p, t] + a for P, T, *a in df.values
    for p in P for t in T
], columns=df.columns)

   Job position  Job type  id
0             6         1   3
1             2         3   4
2             2         6   4
3             2         5   4
4             6         3   4
5             6         6   4
6             6         5   4
7             1         9  43
like image 169
piRSquared Avatar answered Oct 21 '22 17:10

piRSquared