Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

meaning of ddply error: 'names' attribute [9] must be the same length as the vector [1]

Tags:

r

plyr

I'm going through Machine Learning for Hackers, and I am stuck at this line:

from.weight <- ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject))

Which generates the following error:

Error in attributes(out) <- attributes(col) : 
  'names' attribute [9] must be the same length as the vector [1]

This is a traceback():

> traceback()
11: FUN(1:5[[1L]], ...)
10: lapply(seq_len(n), extract_col_rows, df = x, i = i)
9: extract_rows(x$data, x$index[[i]])
8: `[[.indexed_df`(pieces, i)
7: pieces[[i]]
6: function (i) 
   {
       piece <- pieces[[i]]
       if (.inform) {
           res <- try(.fun(piece, ...))
           if (inherits(res, "try-error")) {
               piece <- paste(capture.output(print(piece)), collapse = "\n")
               stop("with piece ", i, ": \n", piece, call. = FALSE)
           }
       }
       else {
           res <- .fun(piece, ...)
       }
       progress$step()
       res
   }(1L)
5: .Call("loop_apply", as.integer(n), f, env)
4: loop_apply(n, do.ply)
3: llply(.data = .data, .fun = .fun, ..., .progress = .progress, 
       .inform = .inform, .parallel = .parallel, .paropts = .paropts)
2: ldply(.data = pieces, .fun = .fun, ..., .progress = .progress, 
       .inform = .inform, .parallel = .parallel, .paropts = .paropts)
1: ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject))

The priority.train object is a data frame, and here is more info:

> mode(priority.train)
[1] "list"
> names(priority.train)
[1] "Date"       "From.EMail" "Subject"    "Message"    "Path"      
> sapply(priority.train, mode)
       Date  From.EMail     Subject     Message        Path 
     "list" "character" "character" "character" "character" 
> sapply(priority.train, class)
$Date
[1] "POSIXlt" "POSIXt" 

$From.EMail
[1] "character"

$Subject
[1] "character"

$Message
[1] "character"

$Path
[1] "character"

> length(priority.train)
[1] 5
> nrow(priority.train)
[1] 1250
> ncol(priority.train)
[1] 5
> str(priority.train)
'data.frame':   1250 obs. of  5 variables:
 $ Date      : POSIXlt, format: "2002-01-31 22:44:14" "2002-02-01 00:53:41" "2002-02-01 02:01:44" "2002-02-01 10:29:23" ...
 $ From.EMail: chr  "[email protected]" "[email protected]" "[email protected]" "[email protected]" ...
 $ Subject   : chr  "please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" ...
 $ Message   : chr  "    \n Hello,\n   \n         I just installed redhat 7.2 and I think I have everything \nworking properly.  Anyway I want to in"| __truncated__ "Make sure you rebuild as root and you're in the directory that you\ndownloaded the file.  Also it might complain of a few depen"| __truncated__ "Lance wrote:\n\n>Make sure you rebuild as root and you're in the directory that you\n>downloaded the file.  Also it might compl"| __truncated__ "Once upon a time, rob wrote :\n\n>  I dl'd gcc3 and libgcc3, but I still get the same error message when I \n> try rpm --rebuil"| __truncated__ ...
 $ Path      : chr  "../03-Classification/data/easy_ham/01061.6610124afa2a5844d41951439d1c1068" "../03-Classification/data/easy_ham/01062.ef7955b391f9b161f3f2106c8cda5edb" "../03-Classification/data/easy_ham/01063.ad3449bd2890a29828ac3978ca8c02ab" "../03-Classification/data/easy_ham/01064.9f4fc60b4e27bba3561e322c82d5f7ff" ...
Warning messages:
1: In encodeString(object, quote = "\"", na.encode = FALSE) :
  it is not known that wchar_t is Unicode on this platform
2: In encodeString(object, quote = "\"", na.encode = FALSE) :
  it is not known that wchar_t is Unicode on this platform

I would post a sample, but the content is a bit long and I don't think the content is relevant here.

The same error also happens here:

> ddply(priority.train, .(Subject))
Error in attributes(out) <- attributes(col) : 
  'names' attribute [9] must be the same length as the vector [1]

Does anyone have a clue on what's going on here? The error seems to be generated by a different object than priority.train, because its names attribute apparently has 9 elements.

I'd appreciate any help. Thanks!

Problem solved

I've found the problem thanks to @user1317221_G's tip of using the dput function. The problem is with the Date field, which is at this point a list that contains 9 fields (sec, min, hour, mday, mon, year, wday, yday, isdst). To solve the problem I've simply converted the dates into character vectors, used ddply then converted the dates back to Date:

> tmp <- priority.train$Date
> priority.train$Date <- as.character(priority.train$Date)
> from.weight <- ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject))
> priority.train$Date <- tmp
> rm(tmp)
like image 940
mota Avatar asked Jan 04 '13 07:01

mota


2 Answers

I fixed this problem I was having by converting format from POSIXlt to POSIXct as Hadley suggests above - one line of code:

    mydata$datetime<-strptime(mydata$datetime, "%Y-%m-%d %H:%M:%S") # original conversion from datetime string : > class(mydata$datetime) [1] "POSIXlt" "POSIXt" 
    mydata$datetime<-as.POSIXct(mydata$datetime) # convert to POSIXct to use in data frames / ddply
like image 78
c.gutierrez Avatar answered Oct 20 '22 22:10

c.gutierrez


You have probably already seen this and it has not helped. I guess we probably do not have an answer yet because people cannot reproduce your error.

A dput or smaller head(dput()) might help this. But here is an alternative using base:

x <- data.frame(A=c("a","b","c","a"),B=c("e","d","d","d"))

ddply(x,.(A),summarise, Freq = length(B))
  A Freq
1 a    2
2 b    1
3 c    1

 tapply(x$B,x$A,length)
a b c 
2 1 1 

Does this tapply work for you?

x2 <- data.frame(A=c("[email protected]", "[email protected]"),
                 B=c("please help a newbie compile mplayer :-)", 
                     "re: please help a newbie compile mplayer :-)"))

tapply(x2$B,x2$A,length)
[email protected] [email protected] 
              1                   1 

ddply(x2,.(A),summarise, Freq = length(B))
                    A Freq
1  [email protected]    1
2 [email protected]    1

you could also try more simply:

table(x2$A)

 [email protected] [email protected] 
              1                   1 
like image 6
user1317221_G Avatar answered Oct 20 '22 22:10

user1317221_G