How do I select the first row of an R data frame that meets certain criteria? Here is the context: I have a data frame with five columns: <pre class="prettyprint"><code>"pixel", "year","propvar", "component", "cumsum." </code></pre> There are 1,225 combinations of <code>pixel</code> and <code>year</code>, because the data was computed from the annual time series of 49 geographic pixels for each of 25 study years. Within each pixel-year, I have computed <code>propvar</code>, the proportion of total variance explained by a given component of the fast Fourier transform for the time series of a given pixel-year. I then computed <code>cumsum</code>, which is the cumulative sum of <code>propvar</code> for each frequency component within a pixel-year. The <code>component</code> column just gives you an index for the Fourier series component (plus 1) from which <code>propvar</code> was calculated. I want to determine the number of components required to explain greater than 99% of the variance. I figure one way to do this is to find the first row within each pixel-year where <code>cumsum</code> > 0.99, and create a data frame from it with three columns, <code>pixel</code>, <code>year</code>, and <code>numbercomps</code>, where <code>numbercomps</code> is the number of components required within a given pixel-year to explain greater than 99% of the variance. I do not know how to do this in R. Does anyone have a solution?

Sure. Something like this should do the trick: <pre class="prettyprint"><code># CREATE A REPRODUCIBLE EXAMPLE! df <- data.frame(year = c("2001", "2003", "2001", "2003", "2003"), pixel = c("a", "b", "a", "b", "a"), cumsum = c(99, 99, 98, 99, 99), numbercomps=1:5) df # year pixel cumsum numbercomps # 1 2001 a 99 1 # 2 2003 b 99 2 # 3 2001 a 98 3 # 4 2003 b 99 4 # 5 2003 a 99 5 # EXTRACT THE SUBSET YOU'D LIKE. res <- subset(df, cumsum>=99) res <- subset(res, subset = !duplicated(res[c("year", "pixel")]), select = c("pixel", "year", "numbercomps")) # pixel year numbercomps # 1 a 2001 1 # 2 b 2003 2 # 5 a 2003 5 </code></pre> EDIT Also, for those interested in <code>data.table</code>, there is this: <pre class="prettyprint"><code>library(data.table) dt <- data.table(df, key="pixel, year") dt[cumsum>=99, .SD[1], by=key(dt)] </code></pre>

How do I select the first row in an R data frame that meets certain criteria?

Tags:

select

dataframe

r

How do I select the first row of an R data frame that meets certain criteria?

Here is the context:

I have a data frame with five columns:

"pixel", "year","propvar", "component", "cumsum."

There are 1,225 combinations of pixel and year, because the data was computed from the annual time series of 49 geographic pixels for each of 25 study years. Within each pixel-year, I have computed propvar, the proportion of total variance explained by a given component of the fast Fourier transform for the time series of a given pixel-year. I then computed cumsum, which is the cumulative sum of propvar for each frequency component within a pixel-year. The component column just gives you an index for the Fourier series component (plus 1) from which propvar was calculated.

I want to determine the number of components required to explain greater than 99% of the variance. I figure one way to do this is to find the first row within each pixel-year where cumsum > 0.99, and create a data frame from it with three columns, pixel, year, and numbercomps, where numbercomps is the number of components required within a given pixel-year to explain greater than 99% of the variance. I do not know how to do this in R. Does anyone have a solution?

696

asked Nov 16 '11 01:11

Brash Equilibrium

1 Answers

Sure. Something like this should do the trick:

# CREATE A REPRODUCIBLE EXAMPLE!
df <- data.frame(year = c("2001", "2003", "2001", "2003", "2003"),
                 pixel = c("a", "b", "a", "b", "a"), 
                 cumsum = c(99, 99, 98, 99, 99),
                 numbercomps=1:5)
df
#   year pixel cumsum numbercomps
# 1 2001     a     99           1
# 2 2003     b     99           2 
# 3 2001     a     98           3
# 4 2003     b     99           4
# 5 2003     a     99           5

# EXTRACT THE SUBSET YOU'D LIKE.
res <- subset(df, cumsum>=99)
res <- subset(res, 
              subset = !duplicated(res[c("year", "pixel")]),
              select = c("pixel", "year", "numbercomps"))
#   pixel year numbercomps
# 1     a 2001           1
# 2     b 2003           2
# 5     a 2003           5

EDIT Also, for those interested in data.table, there is this:

library(data.table)
dt <- data.table(df, key="pixel, year")    
dt[cumsum>=99, .SD[1], by=key(dt)]

answered Sep 23 '22 18:09

Josh O'Brien

Related questions
                            
                                R, Knitr, Rnw, beautiful scientific numbers
                            
                                R adding legend and directlabels to ggplot2 contour plot
                            
                                What is about the first column in R's dataset mtcars?
                            
                                'Forward' cumulative sum in dplyr
                            
                                add a vector to all rows of a matrix
                            
                                A shared legend for z-scores and corresponding p-values in a heatmap
                            
                                t-SNE predictions in R
                            
                                Changing the font size of figure captions in RMarkdown HTML output
                            
                                Why does runif() not predict the interval maximum value?
                            
                                Statsmodels seasonal_decompose - what is naive about it?
                            
                                Scraping the content of all div tags with a specific class
                            
                                How to make variable available to namespace at loading time
                            
                                force column break in RMarkdown ioslides {.columns-2} layout
                            
                                split character columns and get names of field in string
                            
                                More than 9 backreferences in gsub()
                            
                                Update Facebook status using R?
                            
                                How to display the code of a .C routine used by R function?
                            
                                R - change size of axis labels for corrplot
                            
                                Looking at internal Methods
                            
                                Odd behavior with median()?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With