I'd like to write an R function that accepts a formula as its first argument, similar to lm() or glm() and friends. In this case, it's a function that takes a data frame and writes out a file in SVMLight format, which has this general form:
<line> .=. <target> <feature>:<value> <feature>:<value> ... <feature>:<value> # <info>
<target> .=. +1 | -1 | 0 | <float>
<feature> .=. <integer> | "qid"
<value> .=. <float>
<info> .=. <string>
for example, the following data frame:
result qid f1 f2 f3 f4 f5 f6 f7 f8
1 -1 1 0.0000 0.1253 0.0000 0.1017 0.00 0.0000 0.0000 0.9999
2 -1 1 0.0098 0.0000 0.0000 0.0000 0.00 0.0316 0.0000 0.3661
3 1 1 0.0000 0.0000 0.1941 0.0000 0.00 0.0000 0.0509 0.0000
4 -1 2 0.0000 0.2863 0.0948 0.0000 0.34 0.0000 0.7428 0.0608
5 1 2 0.0000 0.0000 0.0000 0.4347 0.00 0.0000 0.9539 0.0000
6 1 2 0.0000 0.7282 0.9087 0.0000 0.00 0.0000 0.0000 0.0355
would be represented as follows:
-1 qid:1 2:0.1253 4:0.1017 8:0.9999
-1 qid:1 1:0.0098 6:0.0316 8:0.3661
1 qid:1 3:0.1941 7:0.0509
-1 qid:2 2:0.2863 3:0.0948 5:0.3400 7:0.7428 8:0.0608
1 qid:2 4:0.4347 7:0.9539
1 qid:2 2:0.7282 3:0.9087 8:0.0355
The function I'd like to write would be called something like this:
write.svmlight(result ~ f1+f2+f3+f4+f5+f6+f7+f8 | qid, data=mydata, file="out.txt")
Or even
write.svmlight(result ~ . | qid, data=mydata, file="out.txt")
But I can't figure out how to use model.matrix()
and/or model.frame()
to know what columns it's supposed to write. Are these the right things to be looking at?
Any help much appreciated!
Partial answer. You can subscript a formula object to get a parse tree of the formula:
> f<-a~b+c|d
> f[[1]]
`~`
> f[[2]]
a
> f[[3]]
b + c | d
> f[[3]][[1]]
`|`
> f[[3]][[2]]
b + c
> f[[3]][[3]]
d
Now all you need is code to walk this tree.
UPDATE: Here's is an example of a function that walks the tree.
walker<-function(formu){
if (!is(formu,"formula"))
stop("Want formula")
lhs <- formu[[2]]
formu <- formu[[3]]
if (formu[[1]]!='|')
stop("Want conditional part")
condi <- formu[[3]]
flattener <- function(f) {if (length(f)<3) return(f);
c(Recall(f[[2]]),Recall(f[[3]]))}
vars <- flattener(formu[[2]])
list(lhs=lhs,condi=condi,vars=vars)
}
walker(y~a+b|c)
Also look at the documentation for terms.formula
and terms.object
. Looking at the code for some functions that take conditional formulas can help, for eg. the lmer
function in lme4
package.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With