I need to put a combined column as the concat of all values of the row.
Source:
pd.DataFrame(data={
'a' : [1,2,3],
'b' : [2,3,4]
})
Target:
pd.DataFrame(data={
'a' : [1,2,3],
'b' : [2,3,4],
'combine' : [[1,2],[2,3],[3,4]]
})
Current solution:
test['combine'] = test[['a','b']].apply(lambda x: pd.Series([x.values]), axis=1)
Issues: I actually have many columns, it seems taking too long to run. Is it a better way.
You can use DataFrame. apply() for concatenate multiple column values into a single column, with slightly less typing and more scalable when you want to join multiple columns .
Grouping by Multiple ColumnsYou can do this by passing a list of column names to groupby instead of a single string value.
df
a b
0 1 2
1 2 3
2 3 4
If you want to add a column of lists as a single column, you'll need to call the .values
attribute, convert it to a nested list, and assign it back -
df['combine'] = df.values.tolist()
# or,
df['combine'] = df[['a', 'b']].values.tolist()
df
a b combine
0 1 2 [1, 2]
1 2 3 [2, 3]
2 3 4 [3, 4]
Note that just assigning the .values
result directly does not work, as pandas
special cases numpy arrays, leading to undesirable outcomes,
df['combine'] = df[['a', 'b']].values
ValueError: Wrong number of items passed 2, placement implies 1
A couple of notes -
try not to use apply
/transform
as much as possible. It is only a convenience function meant to hide the application of a loop, and is slow, offering no performance/vectorization benefits whatosever
keeping columns of `objects offers no performance gains as far as pandas is concerned, so unless the goal is to display data, try to avoid it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With