Convert Two column data frame to occurrence matrix in pandas

Tags:

Hi all I have a csv file which contains data as the format below

A   a
A   b
B   f
B   g
B   e
B   h
C   d
C   e
C   f

The first column contains items second column contains available feature from feature vector=[a,b,c,d,e,f,g,h] I want to convert this to occurence matrix look like below

    a,b,c,d,e,f,g,h
A   1,1,0,0,0,0,0,0
B   0,0,0,0,1,1,1,1
C   0,0,0,1,1,1,0,0

Can anyone tell me how to do this using pandas?

450

asked Jul 20 '15 14:07

Isura Nirmal

2 Answers

Here is another way to do it using pd.get_dummies().

import pandas as pd

# your data
# =======================
df

  col1 col2
0    A    a
1    A    b
2    B    f
3    B    g
4    B    e
5    B    h
6    C    d
7    C    e
8    C    f

# processing
# ===================================
pd.get_dummies(df.col2).groupby(df.col1).apply(max)

      a  b  d  e  f  g  h
col1                     
A     1  1  0  0  0  0  0
B     0  0  0  1  1  1  1
C     0  0  1  1  1  0  0

answered Sep 23 '22 16:09

Jianxun Li

Unclear if your data has a typo or not but you can crosstab for this:

In [95]:
pd.crosstab(index=df['A'], columns = df['a'])

Out[95]:
a  b  d  e  f  g  h
A                  
A  1  0  0  0  0  0
B  0  0  1  1  1  1
C  0  1  1  1  0  0

In your sample data your second column has value a as the name of that column but in your expected output it's in the column as a value

EDIT

OK I fixed your input data so it generates the correct result:

In [98]:
import pandas as pd
import io
t="""A   a
A   b
B   f
B   g
B   e
B   h
C   d
C   e
C   f"""
df = pd.read_csv(io.StringIO(t), sep='\s+', header=None, names=['A','a'])
df

Out[98]:
   A  a
0  A  a
1  A  b
2  B  f
3  B  g
4  B  e
5  B  h
6  C  d
7  C  e
8  C  f

In [99]:
ct = pd.crosstab(index=df['A'], columns = df['a'])
ct

Out[99]:
a  a  b  d  e  f  g  h
A                     
A  1  1  0  0  0  0  0
B  0  0  0  1  1  1  1
C  0  0  1  1  1  0  0

answered Sep 20 '22 16:09

EdChum

Related questions
                            
                                how to run a django python file from command line
                            
                                Sierpinski triangle recursion using turtle graphics
                            
                                set rgba color of points in matplotlib
                            
                                Creating a transparent overlay with qt
                            
                                Determining Hypernym or Hyponym using wordnet nltk
                            
                                How do you get Python to detect for no input
                            
                                Missing Spanish wordnet from NLTK
                            
                                Understanding DictVectorizer in scikit-learn?
                            
                                Changing time from epoch time to iso format in Python [duplicate]
                            
                                How do I inform PyCharm of the location of custom modules?
                            
                                How do I make a space between words when writing to a text file in python
                            
                                Log stack trace for python warning
                            
                                Finding location in code for numpy RuntimeWarning
                            
                                Django can't find template directory
                            
                                python list comprehension (if, continue, break)
                            
                                Django "TemplateDoesNotExist " Error but "Using loader django.template.loaders.app_directories.Loader" File Exists
                            
                                How to correctly break a long line in Python?
                            
                                Python Ranking Dictionary Return Rank
                            
                                using a conditional and lambda in map
                            
                                Python Pandas: Convert nested dictionary to dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Convert Two column data frame to occurrence matrix in pandas

Tags:

python

pandas

sparse-matrix

Isura Nirmal

People also ask

2 Answers

Jianxun Li

EdChum

Recent Activity

Donate For Us