I'm going through Machine Learning for Hackers, and I am stuck at this line:
from.weight <- ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject))
Which generates the following error:
Error in attributes(out) <- attributes(col) :
'names' attribute [9] must be the same length as the vector [1]
This is a traceback():
> traceback()
11: FUN(1:5[[1L]], ...)
10: lapply(seq_len(n), extract_col_rows, df = x, i = i)
9: extract_rows(x$data, x$index[[i]])
8: `[[.indexed_df`(pieces, i)
7: pieces[[i]]
6: function (i)
{
piece <- pieces[[i]]
if (.inform) {
res <- try(.fun(piece, ...))
if (inherits(res, "try-error")) {
piece <- paste(capture.output(print(piece)), collapse = "\n")
stop("with piece ", i, ": \n", piece, call. = FALSE)
}
}
else {
res <- .fun(piece, ...)
}
progress$step()
res
}(1L)
5: .Call("loop_apply", as.integer(n), f, env)
4: loop_apply(n, do.ply)
3: llply(.data = .data, .fun = .fun, ..., .progress = .progress,
.inform = .inform, .parallel = .parallel, .paropts = .paropts)
2: ldply(.data = pieces, .fun = .fun, ..., .progress = .progress,
.inform = .inform, .parallel = .parallel, .paropts = .paropts)
1: ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject))
The priority.train object is a data frame, and here is more info:
> mode(priority.train)
[1] "list"
> names(priority.train)
[1] "Date" "From.EMail" "Subject" "Message" "Path"
> sapply(priority.train, mode)
Date From.EMail Subject Message Path
"list" "character" "character" "character" "character"
> sapply(priority.train, class)
$Date
[1] "POSIXlt" "POSIXt"
$From.EMail
[1] "character"
$Subject
[1] "character"
$Message
[1] "character"
$Path
[1] "character"
> length(priority.train)
[1] 5
> nrow(priority.train)
[1] 1250
> ncol(priority.train)
[1] 5
> str(priority.train)
'data.frame': 1250 obs. of 5 variables:
$ Date : POSIXlt, format: "2002-01-31 22:44:14" "2002-02-01 00:53:41" "2002-02-01 02:01:44" "2002-02-01 10:29:23" ...
$ From.EMail: chr "[email protected]" "[email protected]" "[email protected]" "[email protected]" ...
$ Subject : chr "please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" ...
$ Message : chr " \n Hello,\n \n I just installed redhat 7.2 and I think I have everything \nworking properly. Anyway I want to in"| __truncated__ "Make sure you rebuild as root and you're in the directory that you\ndownloaded the file. Also it might complain of a few depen"| __truncated__ "Lance wrote:\n\n>Make sure you rebuild as root and you're in the directory that you\n>downloaded the file. Also it might compl"| __truncated__ "Once upon a time, rob wrote :\n\n> I dl'd gcc3 and libgcc3, but I still get the same error message when I \n> try rpm --rebuil"| __truncated__ ...
$ Path : chr "../03-Classification/data/easy_ham/01061.6610124afa2a5844d41951439d1c1068" "../03-Classification/data/easy_ham/01062.ef7955b391f9b161f3f2106c8cda5edb" "../03-Classification/data/easy_ham/01063.ad3449bd2890a29828ac3978ca8c02ab" "../03-Classification/data/easy_ham/01064.9f4fc60b4e27bba3561e322c82d5f7ff" ...
Warning messages:
1: In encodeString(object, quote = "\"", na.encode = FALSE) :
it is not known that wchar_t is Unicode on this platform
2: In encodeString(object, quote = "\"", na.encode = FALSE) :
it is not known that wchar_t is Unicode on this platform
I would post a sample, but the content is a bit long and I don't think the content is relevant here.
The same error also happens here:
> ddply(priority.train, .(Subject))
Error in attributes(out) <- attributes(col) :
'names' attribute [9] must be the same length as the vector [1]
Does anyone have a clue on what's going on here? The error seems to be generated by a different object than priority.train, because its names attribute apparently has 9 elements.
I'd appreciate any help. Thanks!
Problem solved
I've found the problem thanks to @user1317221_G's tip of using the dput function. The problem is with the Date field, which is at this point a list that contains 9 fields (sec, min, hour, mday, mon, year, wday, yday, isdst). To solve the problem I've simply converted the dates into character vectors, used ddply then converted the dates back to Date:
> tmp <- priority.train$Date
> priority.train$Date <- as.character(priority.train$Date)
> from.weight <- ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject))
> priority.train$Date <- tmp
> rm(tmp)
I fixed this problem I was having by converting format from POSIXlt to POSIXct as Hadley suggests above - one line of code:
mydata$datetime<-strptime(mydata$datetime, "%Y-%m-%d %H:%M:%S") # original conversion from datetime string : > class(mydata$datetime) [1] "POSIXlt" "POSIXt"
mydata$datetime<-as.POSIXct(mydata$datetime) # convert to POSIXct to use in data frames / ddply
You have probably already seen this and it has not helped. I guess we probably do not have an answer yet because people cannot reproduce your error.
A dput
or smaller head(dput())
might help this. But here is an alternative using base
:
x <- data.frame(A=c("a","b","c","a"),B=c("e","d","d","d"))
ddply(x,.(A),summarise, Freq = length(B))
A Freq
1 a 2
2 b 1
3 c 1
tapply(x$B,x$A,length)
a b c
2 1 1
Does this tapply
work for you?
x2 <- data.frame(A=c("[email protected]", "[email protected]"),
B=c("please help a newbie compile mplayer :-)",
"re: please help a newbie compile mplayer :-)"))
tapply(x2$B,x2$A,length)
[email protected] [email protected]
1 1
ddply(x2,.(A),summarise, Freq = length(B))
A Freq
1 [email protected] 1
2 [email protected] 1
you could also try more simply:
table(x2$A)
[email protected] [email protected]
1 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With