Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I select the first row in an R data frame that meets certain criteria?

How do I select the first row of an R data frame that meets certain criteria?

Here is the context:

I have a data frame with five columns:

"pixel", "year","propvar", "component", "cumsum." 

There are 1,225 combinations of pixel and year, because the data was computed from the annual time series of 49 geographic pixels for each of 25 study years. Within each pixel-year, I have computed propvar, the proportion of total variance explained by a given component of the fast Fourier transform for the time series of a given pixel-year. I then computed cumsum, which is the cumulative sum of propvar for each frequency component within a pixel-year. The component column just gives you an index for the Fourier series component (plus 1) from which propvar was calculated.

I want to determine the number of components required to explain greater than 99% of the variance. I figure one way to do this is to find the first row within each pixel-year where cumsum > 0.99, and create a data frame from it with three columns, pixel, year, and numbercomps, where numbercomps is the number of components required within a given pixel-year to explain greater than 99% of the variance. I do not know how to do this in R. Does anyone have a solution?

like image 696
Brash Equilibrium Avatar asked Nov 16 '11 01:11

Brash Equilibrium


People also ask

How do I select a row based on a condition in R?

By Using subset() R base also provides a subset() function that can be used to select rows based on the logical condition of a column.

How do I select specific rows from a DataFrame in R?

By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.

How do I select the first row of data in R?

To choose the first row by the group in R, use the dplyr package as demonstrated in the code below. The data are sorted in ascending order by arrange() by default, however, we may easily sort the values in descending order instead.

How do you select rows from a DataFrame based on column values in R?

Select Rows by list of Column Values. By using the same notation you can also use an operator %in% to select the DataFrame rows based on a list of values. The following example returns all rows when state values are present in vector values c('CA','AZ','PH') .


1 Answers

Sure. Something like this should do the trick:

# CREATE A REPRODUCIBLE EXAMPLE!
df <- data.frame(year = c("2001", "2003", "2001", "2003", "2003"),
                 pixel = c("a", "b", "a", "b", "a"), 
                 cumsum = c(99, 99, 98, 99, 99),
                 numbercomps=1:5)
df
#   year pixel cumsum numbercomps
# 1 2001     a     99           1
# 2 2003     b     99           2 
# 3 2001     a     98           3
# 4 2003     b     99           4
# 5 2003     a     99           5

# EXTRACT THE SUBSET YOU'D LIKE.
res <- subset(df, cumsum>=99)
res <- subset(res, 
              subset = !duplicated(res[c("year", "pixel")]),
              select = c("pixel", "year", "numbercomps"))
#   pixel year numbercomps
# 1     a 2001           1
# 2     b 2003           2
# 5     a 2003           5

EDIT Also, for those interested in data.table, there is this:

library(data.table)
dt <- data.table(df, key="pixel, year")    
dt[cumsum>=99, .SD[1], by=key(dt)]
like image 65
Josh O'Brien Avatar answered Sep 23 '22 18:09

Josh O'Brien