Calculating an average in a data frame based on locations from separate columns

Tags:

I have a data frame set up like so:

N1 <- c(1,2,4,3,2,3,4,5,4,3,4,5,4,5,6,8,9)
Start <- c("","Start","","","","","","","Start","","","","Start","","","","")
Stop <- c("","","","","Stop","","","","","","Stop","","","","Stop","","")

With N1 being my data of interest. I would like to calculate the mean of a string of numbers based on the "Start" and "Stop" locations in the next two columns.

The strings as defined by "Start" and "Stop" would look like so:

2,4,3,2 
4,3,4
4,5,6

So my final result should be 3 means:

    2.75,3.6,5

999

asked Apr 21 '15 09:04

Vinterwoo

3 Answers

you can try:

mapply(function(start, stop){
          mean(N1[start:stop])
       }, 
       start=which(Start!=""), 
       stop=which(Stop!=""))

#[1] 2.750000 3.666667 5.000000

183

answered Oct 20 '22 23:10

Cath

library(data.table) # need latest 1.9.5+

# set up data to have all 1's column for the period we're interested in and 0 otherwise
d = data.table(N1, event = cumsum((Start != "") - c(0, head(Stop != "", -1))))

d[, mean(N1), by = .(event, rleid(event))][event == 1, V1]
#[1] 2.750000 3.666667 5.000000

# or equivalently
d[, .(event[1], mean(N1)), by = rleid(event)][V1 == 1, V2]

answered Oct 20 '22 23:10

eddi

you can also try rollapply

library(zoo)
x <- sort(c(which(Stop != ""), which(Start != ""))) # indices of Start and Stop
rollapply(x, 2, FUN = function(y) mean(N1[y[1]:y[2]]), by=2)
[1] 2.750000 3.666667 5.000000

answered Oct 21 '22 00:10

Mamoun Benghezal

Related questions
                            
                                Single bar barchart in ggplot2, R
                            
                                Why doesn't 'with' pass variable scope through nested functions?
                            
                                integrate() in R gives terribly wrong answer
                            
                                How to avoid 'sink stack is full' error when sink() is used to capture messages in foreach loop
                            
                                Collinearity after accounting for random/mixed effects
                            
                                Change decimal digits for data frame column in R
                            
                                Sorting month chronologicaly with arrange() from dplyr
                            
                                mean(rnorm(100,mean=0,sd=1)) is not 0; and sd(rnorm(100,mean=0,sd=1)) is not 1. Why?
                            
                                Selecting with dplyr by parameters in column names
                            
                                Car package not found by R (failed to load)
                            
                                How to change the width of area around SelectInput in R shiny
                            
                                In tidyr, what criteria does the function `gather` use to map a dataframe from wide to long?
                            
                                Reading a series matrix properly in R
                            
                                How to remove a function from an R script?
                            
                                Why are these expressions not identical?
                            
                                Import multiple excel sheets using openxlsx
                            
                                How I can find missing numbers in consecutive numbers?
                            
                                How to count a change of number in a matrix in R?
                            
                                Printing regression coefficients from multiple models to a shared data frame
                            
                                How to create pre-annotated rowside column in heatmap.2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Calculating an average in a data frame based on locations from separate columns

Tags:

dataframe

r

mean