I want to generate a dataframe row by row, by using some flavor of apply
on a list of values and a function that returns a single-row data frame for each value. As a toy example, suppose that my values are i = 1:3
and that I have:
f <- function(i) {
return(data.frame(img=letters[i], cached=F, i=i, stringsAsFactors=F))
}
I've been messing around with sapply
, lapply
, a bunch of transpose etc. with no success (for example, d = sapply(1:3, f)
looks promising, but appears to be the transpose of what I want, so I tried d = t(sapply(1:3,f))
, except that it is a matrix; I therefore tried next d = as.data.frame(t(sapply(1:3, f)))
, which appears right (it prints out just like what I want), but is still wrong, as you'd find out if you try to subset it e.g. d[,1]
which is in fact a list).
Finally I got this, which works:
d = apply(data.frame(i=1:3), 2, f)$i
That gives me the frame I wanted:
img cached i
1 a FALSE 1
2 b FALSE 2
3 c FALSE 3
Is there a better/cleaner way to express the above? It all feels pretty kludgy and overly complicated to me.
Edit: as mentioned by several readers, this "toy example" is admitedly too simple, and indeed just f(1:3)
would do what it looks like I am requesting. The actual function is part of a web-based metrics dashboard, draws data from various DB tables, and makes moderately complex plots which I intend to cache (most of the time they change relatively slowly). The relevant part, I guess, is that the function typically takes several arguments, and those arguments aren't a simple sequence 1:n
. So, let me rewrite the example to be a tad more realistic:
library(digest)
gkey <- function(...) {
args <- list(...)
return(digest(paste(args,sep=".",collapse=".")));
}
f <- function(conn, table, checknew.query, plot.query, plot.fun, params) {
latest.data = queryExec(conn, table, checknew.query, params)
key = gkey(table, latest.data, plot.query, plot.fun, params)
out = getFromCacheOrPlot(key, conn, table, plot.query, plot.fun, params)
return(out)
}
where queryExec
builds a query, executes it and retrieves the results, gkey()
computes a hash key based on its parameters, getFromCacheOrPlot()
uses the key
to build a file name (a .png image), retrieves it from cache if it exists, or generates it otherwise. It also returns a data.frame with one row giving us the file name, an html <img=...>
blurb to display it, whether the plot was in or out of cache, and which parameters were used for the plot.
All this is used in a plugin for a wiki system, and certain pages have dozen of plots or more.
To get the nth row in a Pandas DataFrame, we can use the iloc() method. For example, df. iloc[4] will return the 5th row because row numbers start from 0.
do.call(rbind, lapply(i, f))
will do what you're asking... but so would:
data.frame(img=letters[i], cached=F, i=i, stringsAsFactors=F)
As would:
f(i)
What about this? No need to use any flavor of apply
functions
foo <- function(x){
i <- seq_len(x)
data.frame(img=letters[i], cached=FALSE, i=i, stringsAsFactors=F)
}
foo(5)
img cached i
1 a FALSE 1
2 b FALSE 2
3 c FALSE 3
4 d FALSE 4
5 e FALSE 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With