Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to aggregate categorical data in R?

Tags:

r

aggregate

I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). I would like to come up with a table which counts the number of times that these categories appear in the two columns. The dataframe I am using is as follows:

       Category.x  Category.y
1      Better      Better
2      Better      Better
3      Similar     Similar
4      Worse       Similar

I would like to come up with a table like this:

           Category.x    Category.y
Better     2             2
Similar    1             2
Worse      1             0

How would you go about it?

like image 884
Daniel Avatar asked Apr 02 '19 16:04

Daniel


People also ask

How do you aggregate a categorical variable in R?

In order to use the aggregate function for mean in R, you will need to specify the numerical variable on the first argument, the categorical (as a list) on the second and the function to be applied (in this case mean ) on the third. An alternative is to specify a formula of the form: numerical ~ categorical .

How do you aggregate a dataset in R?

The process involves two stages. First, collate individual cases of raw data together with a grouping variable. Second, perform which calculation you want on each group of cases.

What does aggregate () do in R?

aggregate() function is used to get the summary statistics of the data by group. The statistics include mean, min, sum.

What does it mean to aggregate data in R?

Aggregate() Function in R Splits the data into subsets, computes summary statistics for each subsets and returns the result in a group by form. Aggregate function in R is similar to group by in SQL. Aggregate() function is useful in performing all the aggregate operations like sum,count,mean, minimum and Maximum.


2 Answers

As mentioned in the comments, table is standard for this, like

table(stack(DT))

         ind
values    Category.x Category.y
  Better           2          2
  Similar          1          2
  Worse            1          0

or

table(value = unlist(DT), cat = names(DT)[col(DT)])

         cat
value     Category.x Category.y
  Better           2          2
  Similar          1          2
  Worse            1          0

or

with(reshape(DT, direction = "long", varying = 1:2), 
  table(value = Category, cat = time)
)

         cat
value     x y
  Better  2 2
  Similar 1 2
  Worse   1 0
like image 112
Frank Avatar answered Nov 08 '22 08:11

Frank


sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
#        Category.x Category.y
#Better           2          2
#Similar          1          2
#Worse            1          0
like image 26
d.b Avatar answered Nov 08 '22 08:11

d.b