Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Transform dataframe to show if a combination of values exists in the orignal Dataframe

Tags:

python

pandas

I have a Dataframe that looks like this:

 | Col 1 | Col 2 | 
0|   A   |   2   |
1|   A   |   3   |
2|   B   |   1   |
3|   B   |   2   |

and I need to transform it into a Dataframe that shows for each combination, of the values in Col 1 and Col 2 if that combination is contained in the original DataFrame:

  |  1  |  2  |  3  |
A |False|True |True |
B |True |True |False|

Is there a native way in pandas to get this transformation? I was creating the transformed Dataframe manually, but this is way to slow.

Thank you in advance!

like image 561
Cedd0 Avatar asked Dec 11 '19 07:12

Cedd0


People also ask

What are the examples of pandas transform?

Following are the examples of pandas transform are given below: To add 5 to a particular row in the Dataframe. In the above program, we first import the pandas function as pd and later create the dataframe.

How to generate Dataframe from 3rd row in pandas?

Then we use the transform () function in pandas and perform the mathematical operation on the third row and the index recognizes this and the dataframe is returned. Hence, the output is generated successfully.

How to get all possible DataFrames of a given number of rows?

First, you need to find all the combinations using itertools and then use the output of combinations as index to your dataframe. You will get all the possible dataframes of the given number of rows. Thanks for contributing an answer to Stack Overflow!

What is the difference between change and argument in pandas?

Arguments and keyword arguments help to return the function and produce the output. Change is an activity utilized related to groupby (which is one of the most helpful tasks in pandas). I presume most pandas clients likely have utilized total, channel, or apply with groupby, to sum up information.


3 Answers

You could use:

df.groupby(['Col 1','Col 2']).size().unstack(fill_value=0).astype(bool)

Col2      1     2      3
Col1                    
A     False  True   True
B      True  True  False
like image 75
luigigi Avatar answered Oct 21 '22 04:10

luigigi


Here's a pivot solution:

(df.pivot('Col 1', 'Col 2', 'Col 1').fillna(0) != 0).rename_axis(index=None, columns=None)
         1     2      3
A      False  True   True
B       True  True  False
like image 36
oppressionslayer Avatar answered Oct 21 '22 04:10

oppressionslayer


Use get_dummies with max:

df = pd.get_dummies(df.set_index('Col 1')['Col 2'], dtype=bool).rename_axis(None).max(level=0)
print (df)
       1     2      3
A  False  True   True
B   True  True  False

Or if possible not missing values in column Col2 then use DataFrame.pivot with DataFrame.notna, for remove index and columns name use DataFrame.rename_axis:

df = df.pivot('Col 1', 'Col 2', 'Col 1').notna().rename_axis(index=None, columns=None)
print (df)
       1     2      3
A  False  True   True
B   True  True  False

Alternative is possible duplicates and pivot failed:

df = (df.pivot_table(index='Col 1', columns='Col 2', values='Col 1', aggfunc='size')
        .notna()
        .rename_axis(index=None, columns=None))
print (df)
       1     2      3
A  False  True   True
B   True  True  False

Or solution from comments:

df = (pd.crosstab(df['Col 1'], df['Col 2'])
        .gt(0)
        .rename_axis(index=None, columns=None))
like image 24
jezrael Avatar answered Oct 21 '22 04:10

jezrael