Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count the number of likert scale results from multiple column questions in pandas

I have the following dataframe:

       Question1        Question2         Question3          Question4
User1  Agree            Agree          Disagree         Strongly Disagree
User2  Disagree         Agree          Agree            Disagree
User3  Agree            Agree          Agree            Agree

Is there a way to convert the dataframe listed above to the following?

              Agree         Disagree         Strongly Disagree
 Question1    2               1                  0

 Question2    2               1                  0

 Question3    2               1                  0
 Question4    1               1                  1

This is similar to my previous question: Make a dataframe with grouped questions from three columns

I tried looking at previous questions with stack/pivot but could not figure it out. The actual dataframe has 20+ questions and a likert scale from strongly agree, agree, neutral, disagree, strongly disagree.

like image 932
yangd01234 Avatar asked Dec 23 '22 17:12

yangd01234


2 Answers

With pd.get_dummies

pd.get_dummies(df.stack()).groupby(level=1).sum()

           Agree  Disagree  Strongly Disagree
Question1      2         1                  0
Question2      3         0                  0
Question3      2         1                  0
Question4      1         1                  1

Taking it to another level
We can use numpy.bincount to speed things. But we have to pay attention to dimensions

v = df.values
f, u = pd.factorize(v.ravel())
n, m = u.size, v.shape[1]
r = np.tile(np.arange(m), n)
b0 = np.bincount(r * n + f)
pad = np.zeros(n * m - b0.size, dtype=int)
b = np.append(b0, pad)

pd.DataFrame(b.reshape(m, n), df.columns, u)

           Agree  Disagree  Strongly Disagree
Question1      2         1                  0
Question2      3         0                  0
Question3      2         1                  0
Question4      1         1                  1

Another numpy option

v = df.values
n, m = v.shape
f, u = pd.factorize(v.ravel())

pd.DataFrame(
    np.eye(u.size, dtype=int)[f].reshape(n, m, -1).sum(0),
    df.columns, u
)

           Agree  Disagree  Strongly Disagree
Question1      2         1                  0
Question2      3         0                  0
Question3      2         1                  0
Question4      1         1                  1

Speed Difference

%%timeit
v = df.values
f, u = pd.factorize(v.ravel())
n, m = u.size, v.shape[1]
r = np.tile(np.arange(m), n)
b0 = np.bincount(r * n + f)
pad = np.zeros(n * m - b0.size, dtype=int)
b = np.append(b0, pad)
​
pd.DataFrame(b.reshape(m, n), df.columns, u)
1000 loops, best of 3: 194 µs per loop

%%timeit
v = df.values
n, m = v.shape
f, u = pd.factorize(v.ravel())

pd.DataFrame(
    np.eye(u.size, dtype=int)[f].reshape(n, m, -1).sum(0),
    df.columns, u
)
1000 loops, best of 3: 195 µs per loop

%timeit pd.get_dummies(df.stack()).groupby(level=1).sum()
1000 loops, best of 3: 1.2 ms per loop
like image 75
piRSquared Avatar answered Dec 26 '22 05:12

piRSquared


You can iterate over columns with pd.Series.value_counts. If you do this with apply, indices will be aligned automatically:

df.apply(pd.Series.value_counts)
Out: 
                   Question1  Question2  Question3  Question4
Agree                    2.0        3.0        2.0          1
Disagree                 1.0        NaN        1.0          1
Strongly Disagree        NaN        NaN        NaN          1

A little postprocessing:

df.apply(pd.Series.value_counts).fillna(0).astype('int')
Out: 
                   Question1  Question2  Question3  Question4
Agree                      2          3          2          1
Disagree                   1          0          1          1
Strongly Disagree          0          0          0          1
like image 38
ayhan Avatar answered Dec 26 '22 05:12

ayhan