Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas, how to combine multiple columns into an array column

I need to put a combined column as the concat of all values of the row.

Source:

pd.DataFrame(data={
    'a' : [1,2,3],
    'b' : [2,3,4]
})

Target:

pd.DataFrame(data={
    'a' : [1,2,3],
    'b' : [2,3,4],
    'combine' : [[1,2],[2,3],[3,4]]
})

Current solution:

test['combine'] = test[['a','b']].apply(lambda x: pd.Series([x.values]), axis=1)

Issues: I actually have many columns, it seems taking too long to run. Is it a better way.

like image 944
Gnefihz Deng Avatar asked Dec 28 '17 17:12

Gnefihz Deng


People also ask

How do I combine multiple columns into one column in pandas?

You can use DataFrame. apply() for concatenate multiple column values into a single column, with slightly less typing and more scalable when you want to join multiple columns .

Can you group by multiple columns in pandas?

Grouping by Multiple ColumnsYou can do this by passing a list of column names to groupby instead of a single string value.


1 Answers

df

   a  b
0  1  2
1  2  3
2  3  4

If you want to add a column of lists as a single column, you'll need to call the .values attribute, convert it to a nested list, and assign it back -

df['combine'] = df.values.tolist()
# or,
df['combine'] = df[['a', 'b']].values.tolist()
df
   a  b combine
0  1  2  [1, 2]
1  2  3  [2, 3]
2  3  4  [3, 4]

Note that just assigning the .values result directly does not work, as pandas special cases numpy arrays, leading to undesirable outcomes,

df['combine'] = df[['a', 'b']].values

ValueError: Wrong number of items passed 2, placement implies 1

A couple of notes -

  • try not to use apply/transform as much as possible. It is only a convenience function meant to hide the application of a loop, and is slow, offering no performance/vectorization benefits whatosever

  • keeping columns of `objects offers no performance gains as far as pandas is concerned, so unless the goal is to display data, try to avoid it.

like image 55
cs95 Avatar answered Oct 22 '22 03:10

cs95