Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract information from conditional formula

Tags:

r

formula

I'd like to write an R function that accepts a formula as its first argument, similar to lm() or glm() and friends. In this case, it's a function that takes a data frame and writes out a file in SVMLight format, which has this general form:

<line> .=. <target> <feature>:<value> <feature>:<value> ... <feature>:<value> # <info>
<target> .=. +1 | -1 | 0 | <float> 
<feature> .=. <integer> | "qid"
<value> .=. <float>
<info> .=. <string>

for example, the following data frame:

  result qid     f1     f2     f3     f4   f5     f6     f7     f8
1     -1   1 0.0000 0.1253 0.0000 0.1017 0.00 0.0000 0.0000 0.9999
2     -1   1 0.0098 0.0000 0.0000 0.0000 0.00 0.0316 0.0000 0.3661
3      1   1 0.0000 0.0000 0.1941 0.0000 0.00 0.0000 0.0509 0.0000
4     -1   2 0.0000 0.2863 0.0948 0.0000 0.34 0.0000 0.7428 0.0608
5      1   2 0.0000 0.0000 0.0000 0.4347 0.00 0.0000 0.9539 0.0000
6      1   2 0.0000 0.7282 0.9087 0.0000 0.00 0.0000 0.0000 0.0355

would be represented as follows:

-1 qid:1 2:0.1253 4:0.1017 8:0.9999
-1 qid:1 1:0.0098 6:0.0316 8:0.3661
1  qid:1 3:0.1941 7:0.0509
-1 qid:2 2:0.2863 3:0.0948 5:0.3400 7:0.7428 8:0.0608
1  qid:2 4:0.4347 7:0.9539
1  qid:2 2:0.7282 3:0.9087 8:0.0355

The function I'd like to write would be called something like this:

write.svmlight(result ~ f1+f2+f3+f4+f5+f6+f7+f8 | qid, data=mydata, file="out.txt")

Or even

write.svmlight(result ~ . | qid, data=mydata, file="out.txt")

But I can't figure out how to use model.matrix() and/or model.frame() to know what columns it's supposed to write. Are these the right things to be looking at?

Any help much appreciated!

like image 917
Ken Williams Avatar asked Mar 11 '10 17:03

Ken Williams


1 Answers

Partial answer. You can subscript a formula object to get a parse tree of the formula:

> f<-a~b+c|d
> f[[1]]
`~`
> f[[2]]
a
> f[[3]]
b + c | d
> f[[3]][[1]]
`|`
> f[[3]][[2]]
b + c
> f[[3]][[3]]
d

Now all you need is code to walk this tree.

UPDATE: Here's is an example of a function that walks the tree.

walker<-function(formu){
  if (!is(formu,"formula"))
    stop("Want formula")
  lhs <- formu[[2]]
  formu <- formu[[3]]

  if (formu[[1]]!='|')
    stop("Want conditional part")

  condi <- formu[[3]]

  flattener <- function(f) {if (length(f)<3) return(f);
                            c(Recall(f[[2]]),Recall(f[[3]]))}
  vars <- flattener(formu[[2]])

  list(lhs=lhs,condi=condi,vars=vars)
}

walker(y~a+b|c)

Also look at the documentation for terms.formula and terms.object. Looking at the code for some functions that take conditional formulas can help, for eg. the lmer function in lme4 package.

like image 107
Jyotirmoy Bhattacharya Avatar answered Oct 13 '22 16:10

Jyotirmoy Bhattacharya