I have a Dataframe that looks like this: <pre class="prettyprint"><code> | Col 1 | Col 2 | 0| A | 2 | 1| A | 3 | 2| B | 1 | 3| B | 2 | </code></pre> and I need to transform it into a Dataframe that shows for each combination, of the values in Col 1 and Col 2 if that combination is contained in the original DataFrame: <pre class="prettyprint"><code> | 1 | 2 | 3 | A |False|True |True | B |True |True |False| </code></pre> Is there a native way in pandas to get this transformation? I was creating the transformed Dataframe manually, but this is way to slow. Thank you in advance!

Here's a pivot solution: <pre class="prettyprint"><code>(df.pivot('Col 1', 'Col 2', 'Col 1').fillna(0) != 0).rename_axis(index=None, columns=None) </code></pre> <pre class="prettyprint"><code> 1 2 3 A False True True B True True False </code></pre>

Pandas: Transform dataframe to show if a combination of values exists in the orignal Dataframe

Tags:

python

pandas

I have a Dataframe that looks like this:

 | Col 1 | Col 2 | 
0|   A   |   2   |
1|   A   |   3   |
2|   B   |   1   |
3|   B   |   2   |

and I need to transform it into a Dataframe that shows for each combination, of the values in Col 1 and Col 2 if that combination is contained in the original DataFrame:

  |  1  |  2  |  3  |
A |False|True |True |
B |True |True |False|

Is there a native way in pandas to get this transformation? I was creating the transformed Dataframe manually, but this is way to slow.

Thank you in advance!

561

asked Dec 11 '19 07:12

Cedd0

3 Answers

You could use:

df.groupby(['Col 1','Col 2']).size().unstack(fill_value=0).astype(bool)

Col2      1     2      3
Col1                    
A     False  True   True
B      True  True  False

answered Oct 21 '22 04:10

luigigi

Here's a pivot solution:

(df.pivot('Col 1', 'Col 2', 'Col 1').fillna(0) != 0).rename_axis(index=None, columns=None)

         1     2      3
A      False  True   True
B       True  True  False

answered Oct 21 '22 04:10

oppressionslayer

Use get_dummies with max:

df = pd.get_dummies(df.set_index('Col 1')['Col 2'], dtype=bool).rename_axis(None).max(level=0)
print (df)
       1     2      3
A  False  True   True
B   True  True  False

Or if possible not missing values in column Col2 then use DataFrame.pivot with DataFrame.notna, for remove index and columns name use DataFrame.rename_axis:

df = df.pivot('Col 1', 'Col 2', 'Col 1').notna().rename_axis(index=None, columns=None)
print (df)
       1     2      3
A  False  True   True
B   True  True  False

Alternative is possible duplicates and pivot failed:

df = (df.pivot_table(index='Col 1', columns='Col 2', values='Col 1', aggfunc='size')
        .notna()
        .rename_axis(index=None, columns=None))
print (df)
       1     2      3
A  False  True   True
B   True  True  False

Or solution from comments:

df = (pd.crosstab(df['Col 1'], df['Col 2'])
        .gt(0)
        .rename_axis(index=None, columns=None))

answered Oct 21 '22 04:10

jezrael

Related questions
                            
                                Nested tf.function is horribly slow
                            
                                Forward Fill Pandas Dataframe Horizontally (along rows) without forward filling last value in each row
                            
                                Pandas / xlsxwriter writer.close() does not completely close the excel file
                            
                                Finding all possible combinations whose sum is within certain range of target
                            
                                How to map one dataframe to another (python pandas)?
                            
                                TypeError: cannot unpack non-iterable bool object
                            
                                Increase accuracy of detecting lines using OpenCV
                            
                                Why does pandas remove leading zero when writing to a csv?
                            
                                Efficiently remove duplicates, order-agnostic, from list of lists
                            
                                How do I convert decorated latin unicode characters to plain latin in python
                            
                                Gunicorn Flask application in Docker Container not getting Exposed
                            
                                Read CSV with JSON feature
                            
                                Timeout Error in Fraudulent Activity Notification HackerRank
                            
                                How to set SetGlobalSpanCostCoefficient and the capacity parameter in AddDimension properly?
                            
                                Inconsistent alignment of title and suptitle in matplotlib
                            
                                Intersection over union on non rectangular quadrilaterals
                            
                                Masking tensor of same shape in PyTorch
                            
                                Runtime error 999 when trying to use cuda with pytorch
                            
                                In Python what is it called when you see the output of a variable without printing it?
                            
                                AttributeError: module 'tensorflow.python.keras.backend' has no attribute 'get_graph'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: Transform dataframe to show if a combination of values exists in the orignal Dataframe

Tags:

python

pandas

Cedd0

People also ask

3 Answers

luigigi

oppressionslayer

jezrael

Recent Activity

Donate For Us