Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use apply to generate a data frame row by row?

Tags:

dataframe

r

apply

I want to generate a dataframe row by row, by using some flavor of apply on a list of values and a function that returns a single-row data frame for each value. As a toy example, suppose that my values are i = 1:3 and that I have:

f <- function(i) {
    return(data.frame(img=letters[i], cached=F, i=i, stringsAsFactors=F))
}

I've been messing around with sapply, lapply, a bunch of transpose etc. with no success (for example, d = sapply(1:3, f) looks promising, but appears to be the transpose of what I want, so I tried d = t(sapply(1:3,f)), except that it is a matrix; I therefore tried next d = as.data.frame(t(sapply(1:3, f))), which appears right (it prints out just like what I want), but is still wrong, as you'd find out if you try to subset it e.g. d[,1] which is in fact a list).

Finally I got this, which works:

d = apply(data.frame(i=1:3), 2, f)$i

That gives me the frame I wanted:

  img cached i
1   a  FALSE 1
2   b  FALSE 2
3   c  FALSE 3

Is there a better/cleaner way to express the above? It all feels pretty kludgy and overly complicated to me.


Edit: as mentioned by several readers, this "toy example" is admitedly too simple, and indeed just f(1:3) would do what it looks like I am requesting. The actual function is part of a web-based metrics dashboard, draws data from various DB tables, and makes moderately complex plots which I intend to cache (most of the time they change relatively slowly). The relevant part, I guess, is that the function typically takes several arguments, and those arguments aren't a simple sequence 1:n. So, let me rewrite the example to be a tad more realistic:

library(digest)
gkey   <- function(...) {
  args <- list(...)
  return(digest(paste(args,sep=".",collapse=".")));
}

f <- function(conn, table, checknew.query, plot.query, plot.fun, params) {
  latest.data = queryExec(conn, table, checknew.query, params)
  key = gkey(table, latest.data, plot.query, plot.fun, params)
  out = getFromCacheOrPlot(key, conn, table, plot.query, plot.fun, params)
  return(out)
}

where queryExec builds a query, executes it and retrieves the results, gkey() computes a hash key based on its parameters, getFromCacheOrPlot() uses the key to build a file name (a .png image), retrieves it from cache if it exists, or generates it otherwise. It also returns a data.frame with one row giving us the file name, an html <img=...> blurb to display it, whether the plot was in or out of cache, and which parameters were used for the plot.

All this is used in a plugin for a wiki system, and certain pages have dozen of plots or more.

like image 335
Pierre D Avatar asked Jan 15 '13 19:01

Pierre D


People also ask

How do you fetch a row in a data frame?

To get the nth row in a Pandas DataFrame, we can use the iloc() method. For example, df. iloc[4] will return the 5th row because row numbers start from 0.


2 Answers

do.call(rbind, lapply(i, f)) will do what you're asking... but so would:

data.frame(img=letters[i], cached=F, i=i, stringsAsFactors=F)

As would:

f(i)
like image 53
Justin Avatar answered Nov 11 '22 16:11

Justin


What about this? No need to use any flavor of apply functions

foo <- function(x){
  i <- seq_len(x)
  data.frame(img=letters[i], cached=FALSE, i=i, stringsAsFactors=F)
}


  foo(5)
  img cached i
1   a  FALSE 1
2   b  FALSE 2
3   c  FALSE 3
4   d  FALSE 4
5   e  FALSE 5
like image 28
Jilber Urbina Avatar answered Nov 11 '22 14:11

Jilber Urbina