I've seen a few variations on the theme of exploding a column/series into multiple columns of a Pandas dataframe, but I've been trying to do something and not really succeeding with the existing approaches.
Given a DataFrame like so:
key val id 2 foo oranges 2 bar bananas 2 baz apples 3 foo grapes 3 bar kiwis
I want to convert the items in the key
series into columns, with the val
values serving as the values, like so:
foo bar baz id 2 oranges bananas apples 3 grapes kiwis NaN
I feel like this is something that should be relatively straightforward, but I've been bashing my head against this for a few hours now with increasing levels of convolution, and no success.
We can use str. split() to split one column to multiple columns by specifying expand=True option. We can use str. extract() to exract multiple columns using regex expression in which multiple capturing groups are defined.
You can use the loc and iloc functions to access columns in a Pandas DataFrame. Let's see how. If we wanted to access a certain column in our DataFrame, for example the Grades column, we could simply use the loc function and specify the name of the column in order to retrieve it.
Pandas DataFrame: assign() functionThe assign() function is used to assign new columns to a DataFrame. Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten. The column names are keywords.
There are a few ways:
using .pivot_table
:
>>> df.pivot_table(values='val', index=df.index, columns='key', aggfunc='first') key bar baz foo id 2 bananas apples oranges 3 kiwis NaN grapes
using .pivot
:
>>> df.pivot(index=df.index, columns='key')['val'] key bar baz foo id 2 bananas apples oranges 3 kiwis NaN grapes
using .groupby
followed by .unstack
:
>>> df.reset_index().groupby(['id', 'key'])['val'].aggregate('first').unstack() key bar baz foo id 2 bananas apples oranges 3 kiwis NaN grapes
You could use set_index
and unstack
In [1923]: df.set_index([df.index, 'key'])['val'].unstack() Out[1923]: key bar baz foo id 2 bananas apples oranges 3 kiwis None grapes
Or, a simplified groupby
In [1926]: df.groupby([df.index, 'key'])['val'].first().unstack() Out[1926]: key bar baz foo id 2 bananas apples oranges 3 kiwis None grapes
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With