Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table of table is very different from data.frame of table

Tags:

r

data.table

I know that table is not the preferred way to make a frequency table as a data.table. But suppose I have a table, for whatever reason, that I want to convert to a data.table. The data.table conversion does not work the same way the data.frame conversion does:

library(data.table)
tab <- table(1:101)
DF.tab <- data.frame(tab)
DT.tab <- data.table(tab)

data.frame converts the table data into a data.frame, while data.table attempts to store the original table object as a column. (I've tested this with tab <- table(1:n) for multiple values of n, among other examples.)

> str(DF.tab)
'data.frame':   101 obs. of  2 variables:
 $ Var1: Factor w/ 101 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Freq: int  1 1 1 1 1 1 1 1 1 1 ...
> str(DT.tab)
Classes ‘data.table’ and 'data.frame':  101 obs. of  1 variable:
 $ tab: 'table' int [1:101(1d)] 1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "dimnames")=List of 1
  .. ..$ : chr  "1" "2" "3" "4" ...
 - attr(*, ".internal.selfref")=<externalptr> 

Note also that while as.data.frame works the same way as data.frame, as.data.table fails entirely:

> as.data.table(tab)
Error in UseMethod("as.data.table") : 
  no applicable method for 'as.data.table' applied to an object of class "table"

In what seems to be a very closely related problem, if the table is sufficiently large (informal testing suggests .Dim > 100), I get very strange errors when trying to print:

> print(data.table(table(1:101)))
Error in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L,  : 
  dims [product 5] do not match the length of object [10]

Note that print(data.table(table(1:100))) does not have an error, but only displays one column V1, while print(data.frame(table(1:100))) has Var1 and Freq columns.

Is there any better workaround than data.table(data.frame(...))? Am I better off always trying to avoid table entirely? And is the print error directly caused by this, or is it something deeper?

like image 788
Frank Avatar asked Aug 22 '13 21:08

Frank


People also ask

Is a table a data frame?

Data Visualization using R ProgrammingA data frame is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column. Following are the characteristics of a data frame. The column names should be non-empty.

Are tables faster than DataFrames?

For the size of DF data. table is slower, for the size of DF2 it is faster.

How do I convert a data table to a DataFrame in R?

The as. data. frame() function converts a table to a data frame in a format that you need for regression analysis on count data. If you need to summarize the counts first, you use table() to create the desired table.


1 Answers

There is an as.data.frame.table function that is called with data.frame(tbl-object). It converts the matrix-like table-object to a long-format data object. There appears to be no as.data.table.table function as yet and arguably there should be and I would agree that it should behave in the same manner as as.data.frame method rather than inheriting from matrix (which is how table would usually inherit:

> data.table(matrix(1:10, 2))
   V1 V2 V3 V4 V5
1:  1  3  5  7  9
2:  2  4  6  8 10
> data.table(as.table(matrix(1:10, 2)))
Error in UseMethod("as.data.table") : 
  no applicable method for 'as.data.table' applied to an object of class "table"
> data.table(as.data.frame(as.table(matrix(1:10, 2))))
    Var1 Var2 Freq
 1:    A    A    1
 2:    B    A    2
 3:    A    B    3
 4:    B    B    4
 5:    A    C    5
 6:    B    C    6
 7:    A    D    7
 8:    B    D    8
 9:    A    E    9
10:    B    E   10

I think this should be a feature request and I don't think it is related to the second problem.

Your second question seems like a bug. The data.table authors most prominently @MatthewDowle are generally quite responsive, and you should consider submitting a report.

like image 67
IRTFM Avatar answered Nov 07 '22 06:11

IRTFM