Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Make table show percentages instead of frequencies in R

Tags:

function

r

I have some words in my dataframe df each belonging to category A or B. Within each category the words may be of type 1, 2 or 3. I used the table() function to show how the words are distributed across the categories and types. The output looks like:

         category
type     A    B
1        30  79
2        12  94
3        29  6 

As you can see the table counts frequencies, but I want it to calculate the percentages instead. I have tried prop.table but I get the following error

Error in FUN(X[[1L]], ...) : only defined on a data frame with all numeric variables

I couldn't find a solution anywhere else; please help. Thank you.

Here's my sample data:

head(items)

       item   type category
[1]    PA100   1    A
[2]    PB101   2    A
[3]    UR360   2    A
[4]    PX977   3    B
[5]    GA008   3    B
[6]    GR446   3    A
like image 304
Tavi Avatar asked Aug 27 '14 00:08

Tavi


2 Answers

As mentioned in the comments, you can use a prop.table on a table object. In your case, use a margin = 1, which means we want to calculate the percentages across the rows of the table.

> tab <- with(items, table(type, category))
> prop.table(tab, margin = 1)
#     category
# type         A         B
#    1 1.0000000 0.0000000
#    2 1.0000000 0.0000000
#    3 0.3333333 0.6666667

For actual percentages, you can multiply the table by 100

> prop.table(tab, 1)*100
#     category
# type         A         B
#    1 100.00000   0.00000
#    2 100.00000   0.00000
#    3  33.33333  66.66667

where

items <- 
structure(list(item = structure(c(3L, 4L, 6L, 5L, 1L, 2L), .Label = c("GA008", 
"GR446", "PA100", "PB101", "PX977", "UR360"), class = "factor"), 
    type = c(1L, 2L, 2L, 3L, 3L, 3L), category = structure(c(1L, 
    1L, 1L, 2L, 2L, 1L), .Label = c("A", "B"), class = "factor")), .Names = c("item", 
"type", "category"), class = "data.frame", row.names = c(NA, 
-6L))
like image 53
Rich Scriven Avatar answered Nov 06 '22 07:11

Rich Scriven


This might be quite late but sharing it in case someone else faces a similar problem. You can still achieve your required output with table() and prop.table(). You just have to do it in two steps for factor variables.

df = table(items$type, items$category)
prop.table(df)

Read below for further explanation.

For the following dataframe items:

 item type category
PA100  1        A
PB101  2        A
UR360  2        A
PX977  3        B
GA008  3        B
GR446  3        A

First, run the table() command and store it into df

df = table(items$type, items$category)

df

A  B
1  0
2  0
1  2

Then, run your prop.table() command on df as below:

prop.table(df)

A         B
0.1666667 0.0000000
0.3333333 0.0000000
0.1666667 0.3333333

With the round() command you can also specify the number of decimal places you want to keep:

round(prop.table(df),digits = 2)

A     B
0.17  0.00
0.33  0.00
0.17  0.33

And if you wanted to keep the percentages only, you could do the following:

round(100*prop.table(df),digits = 0)

A   B
17  0
33  0
17  33
like image 34
Sandy Avatar answered Nov 06 '22 07:11

Sandy