Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Better way to produce data frame using table()

Tags:

r

crosstab

Recently, I have found that I am using the following pattern over and over again. The process is:

  • cross-tabulate numeric variable by factor using table
  • create data frame from created table
  • add original numeric values to data frame (from row names (!))
  • remove row names
  • reorder columns of aggregated data frame

In R, it looks like this:

# Sample data
df <- data.frame(x = round(runif(100), 1), 
                 y = factor(ifelse(runif(100) > .5, 1, 0), 
                            labels = c('failure', 'success')) 
                )

# Get frequencies
dfSummary <- as.data.frame.matrix(table(df$x, df$y))
# Add column of original values from rownames
dfSummary$x <- as.numeric(rownames(dfSummary))
# Remove rownames
rownames(dfSummary) <- NULL
# Reorder columns
dfSummary <- dfSummary[, c(3, 1, 2)]

Is there anything more elegant in R, preferably using base functions? I know I can use sql to do this in single command - I think that it has to be possible to achieve similar behavior in R.

sqldf solution:

library(sqldf)
dfSummary <- sqldf("select 
                     x, 
                     sum(y = 'failure') as failure,
                     sum(y = 'success') as success
                    from df group by x")
like image 558
Tomas Greif Avatar asked Jan 18 '15 18:01

Tomas Greif


People also ask

How do I create a data frame from a table in R?

How to Create a Data Frame. We can create a dataframe in R by passing the variable a,b,c,d into the data. frame() function. We can R create dataframe and name the columns with name() and simply specify the name of the variables.

Is data table better than data frame?

frame in R is similar to the data table which is used to create tabular data but data table provides a lot more features than the data frame so, generally, all prefer the data. table instead of the data.


1 Answers

An alternative with base R could be:

aggregate(. ~ x, transform(df, success = y == "sucess", 
                               failure = y == "failure", y = NULL), sum)
#     x success failure
#1  0.0       2       4
#2  0.1       6       8
#3  0.2       1       7
#4  0.3       5       4
#5  0.4       6       6
#6  0.5       3       3
#7  0.6       4       6
#8  0.7       6       6
#9  0.8       4       5
#10 0.9       6       7
#11 1.0       1       0
like image 166
talat Avatar answered Sep 18 '22 21:09

talat