Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R data table - create a new column where each element is a list of values

I've started working with R data.table and I'm trying to do the following: For simplicity, let's say that I have a list of ArticleReadings as follows:

UserID    Time  ArticleID   Category   NumOfReading
'aaa'     7:50   'x'        'sports'   1
'bbb'     5:05   'x'        'sports'   1
'aaa'     8:40    'y'       'politics' 2
'aaa'     10:00    'z'      'sports'   3

Eventually I would want a new column that will contain a list of all the categories read by a specific user. In this example, the value for user 'aaa' will be a vector of 'politics','sports', and for user 'bbb' it will be a vector with one element: 'sports'. I want this type of column because later on I would want to have some manipulations on it (e.g. compute the Mode/dominant category, or display the popular combinations of categories), so I thought to first get a unique vector for each user, then sort it. All my trials to have such functions as the new value of the column resulted in setting the vector values seperately for each element, and not a vector as the column value.... for example, one of my trials:

CategoriesList <- function(x){sort(unique(x))}
DT[,':='(UniqueCats=CategoriesList(Category)),by=userID]

As I'm new to data.table and to user defined functions in R, I guess that I'm missing some critical point regarding transferring the result to a vector... Any help would be appreciated!

like image 691
user3017075 Avatar asked Sep 05 '16 11:09

user3017075


1 Answers

If we need a list column in the dataset, wrap it with list

DT[, UniqueCats := list(list(sort(unique(Category)))) , by = UserID]
str(DT)
#Classes ‘data.table’ and 'data.frame':  4 obs. of  6 variables:
# $ UserID      : chr  "aaa" "bbb" "aaa" "aaa"
# $ Time        : chr  "7:50" "5:05" "8:40" "10:00"
# $ ArticleID   : chr  "x" "x" "y" "z"
# $ Category    : chr  "sports" "sports" "politics" "sports"
# $ NumOfReading: int  1 1 2 3
# $ UniqueCats  :List of 4
#  ..$ : chr  "politics" "sports"
#  ..$ : chr "sports"
#  ..$ : chr  "politics" "sports"
#  ..$ : chr  "politics" "sports"

We can also create a string column by concatenating the elements together with paste

DT[, uniqueCats := toString(sort(unique(Category))), by = UserID]
like image 131
akrun Avatar answered Oct 12 '22 19:10

akrun