Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating an average in a data frame based on locations from separate columns

Tags:

dataframe

r

mean

I have a data frame set up like so:

N1 <- c(1,2,4,3,2,3,4,5,4,3,4,5,4,5,6,8,9)
Start <- c("","Start","","","","","","","Start","","","","Start","","","","")
Stop <- c("","","","","Stop","","","","","","Stop","","","","Stop","","")

With N1 being my data of interest. I would like to calculate the mean of a string of numbers based on the "Start" and "Stop" locations in the next two columns.

The strings as defined by "Start" and "Stop" would look like so:

2,4,3,2 
4,3,4
4,5,6

So my final result should be 3 means:

    2.75,3.6,5
like image 999
Vinterwoo Avatar asked Apr 21 '15 09:04

Vinterwoo


People also ask

How do you average columns in a data frame?

To get column average or mean from pandas DataFrame use either mean() and describe() method. The DataFrame. mean() method is used to return the mean of the values for the requested axis.

How do you find the average of a column in spark DataFrame?

Method -1 : Using select() method If we want to return the average value from multiple columns, we have to use the avg() method inside the select() method by specifying the column name separated by a comma. Where, df is the input PySpark DataFrame. column_name is the column to get the average value.

How do you find the average of a group by in Python?

To get the average (or mean) value of in each group, you can directly apply the pandas mean() function to the selected columns from the result of pandas groupby.


3 Answers

you can try:

mapply(function(start, stop){
          mean(N1[start:stop])
       }, 
       start=which(Start!=""), 
       stop=which(Stop!=""))

#[1] 2.750000 3.666667 5.000000
like image 183
Cath Avatar answered Oct 20 '22 23:10

Cath


library(data.table) # need latest 1.9.5+

# set up data to have all 1's column for the period we're interested in and 0 otherwise
d = data.table(N1, event = cumsum((Start != "") - c(0, head(Stop != "", -1))))

d[, mean(N1), by = .(event, rleid(event))][event == 1, V1]
#[1] 2.750000 3.666667 5.000000

# or equivalently
d[, .(event[1], mean(N1)), by = rleid(event)][V1 == 1, V2]
like image 4
eddi Avatar answered Oct 20 '22 23:10

eddi


you can also try rollapply

library(zoo)
x <- sort(c(which(Stop != ""), which(Start != ""))) # indices of Start and Stop
rollapply(x, 2, FUN = function(y) mean(N1[y[1]:y[2]]), by=2)
[1] 2.750000 3.666667 5.000000
like image 2
Mamoun Benghezal Avatar answered Oct 21 '22 00:10

Mamoun Benghezal