I have a Dataframe that looks like this:
| Col 1 | Col 2 |
0| A | 2 |
1| A | 3 |
2| B | 1 |
3| B | 2 |
and I need to transform it into a Dataframe that shows for each combination, of the values in Col 1 and Col 2 if that combination is contained in the original DataFrame:
| 1 | 2 | 3 |
A |False|True |True |
B |True |True |False|
Is there a native way in pandas to get this transformation? I was creating the transformed Dataframe manually, but this is way to slow.
Thank you in advance!
Following are the examples of pandas transform are given below: To add 5 to a particular row in the Dataframe. In the above program, we first import the pandas function as pd and later create the dataframe.
Then we use the transform () function in pandas and perform the mathematical operation on the third row and the index recognizes this and the dataframe is returned. Hence, the output is generated successfully.
First, you need to find all the combinations using itertools and then use the output of combinations as index to your dataframe. You will get all the possible dataframes of the given number of rows. Thanks for contributing an answer to Stack Overflow!
Arguments and keyword arguments help to return the function and produce the output. Change is an activity utilized related to groupby (which is one of the most helpful tasks in pandas). I presume most pandas clients likely have utilized total, channel, or apply with groupby, to sum up information.
You could use:
df.groupby(['Col 1','Col 2']).size().unstack(fill_value=0).astype(bool)
Col2 1 2 3
Col1
A False True True
B True True False
Here's a pivot solution:
(df.pivot('Col 1', 'Col 2', 'Col 1').fillna(0) != 0).rename_axis(index=None, columns=None)
1 2 3
A False True True
B True True False
Use get_dummies
with max
:
df = pd.get_dummies(df.set_index('Col 1')['Col 2'], dtype=bool).rename_axis(None).max(level=0)
print (df)
1 2 3
A False True True
B True True False
Or if possible not missing values in column Col2
then use DataFrame.pivot
with DataFrame.notna
, for remove index and columns name use DataFrame.rename_axis
:
df = df.pivot('Col 1', 'Col 2', 'Col 1').notna().rename_axis(index=None, columns=None)
print (df)
1 2 3
A False True True
B True True False
Alternative is possible duplicates and pivot
failed:
df = (df.pivot_table(index='Col 1', columns='Col 2', values='Col 1', aggfunc='size')
.notna()
.rename_axis(index=None, columns=None))
print (df)
1 2 3
A False True True
B True True False
Or solution from comments:
df = (pd.crosstab(df['Col 1'], df['Col 2'])
.gt(0)
.rename_axis(index=None, columns=None))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With