Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to group_by variable and cut time into 10s bins starting at 13:24:00 exactly and average for group_by variable

Tags:

time

r

dplyr

I have CO2 measurement data by 30 sensors that don't all measure at the same time, nor do they all start at exactly the same time. I would like to align them as best as possible, so I thought that taking 10s averages might be a good solution.

In a previous question: Group by multiple variables and summarise dplyr I cut the time in 10s chunks for each sensor and averaged each sensors reading over those 10s. Sounds OK but, what I've realised is that the following code cuts the time from whatever time each sensor starts at and therefore they are still not aligned. How can I align them?

require(tidyverse)
require(lubridate)  
  df %>%
        group_by(Sensor, BinnedTime = cut(DeviceTime, breaks="10 sec")) %>%
            mutate(Concentration = mean(calCO2)) %>%
        ungroup()

    head(df)
# A tibble: 6 x 7
# Groups: BinnedTime [1]

  Sensor Date       Time   calCO2 DeviceTime          cuts   BinnedTime         
  <fctr> <date>     <time>  <dbl> <dttm>              <fctr> <chr>              
1 N1     2019-02-12 13:24     400 2019-02-12 13:24:02 (0,10] 2019-02-12 13:24:02
2 N1     2019-02-12 13:24     400 2019-02-12 13:24:02 (0,10] 2019-02-12 13:24:02
3 N1     2019-02-12 13:24     400 2019-02-12 13:24:03 (0,10] 2019-02-12 13:24:03
4 N2     2019-02-12 13:24     400 2019-02-12 13:24:03 (0,10] 2019-02-12 13:24:02
5 N3     2019-02-12 13:24     400 2019-02-12 13:24:03 (0,10] 2019-02-12 13:24:02
6 N3     2019-02-12 13:24     400 2019-02-12 13:24:05 (0,10] 2019-02-12 13:24:04

Edit

I've tried:

dt<-seq(
  from=as.POSIXct("2019-02-12 13:24:00", tz="GMT"),
  to=as.POSIXct("2019-02-12 14:00:00", tz="GMT"),
  by="10 sec"
) 

cut(df$BinnedTime,dt)

but it gives an error saying x must be numeric, so I converted both df$BinnedTime and dt$dt to numeric and this produces only NAs.

cut(as.numeric(as.POSIXct(df$BinnedTime)), as.numeric(dt))

What am I missing?

Edit 2

I have the following:

df$DeviceTime <- as.POSIXct(paste(d$Date, d$Time), format="%Y-%m-%d %H:%M:%S")

df<-df%>%
  mutate(BinnedTime=floor_date(ymd_hms(DeviceTime),unit="10 sec"))%>%
  group_by(Sensor)%>%
  group_by(BinnedTime,add=TRUE)%>%
  summarize(calCO2 = mean(na.omit(calCO2)))

Which I think is now what I'm after but it's not elegant.

Here is the data file in onedrive: df.txt until 30th March 19

like image 419
HCAI Avatar asked Feb 26 '19 11:02

HCAI


People also ask

What is grouping data by time intervals?

Grouping data by time intervals is very obvious when you come across Time-Series Analysis. A time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time.

How do I bin data in group data into bins?

The Group Data into Bins component supports multiple options for binning data. You can customize how the bin edges are set and how values are apportioned into the bins. For example, you can: Manually type a series of values to serve as the bin boundaries. Assign values to bins by using quantiles, or percentile ranks.

How are the values placed in the bins?

The values from the data column are placed in the bins such that each bin has the same interval between starting and ending values. As a result, some bins might have more values if data is clumped around a certain point. Custom Edges: You can specify the values that begin each bin.

What happens to the values in the bins after normalization?

Then, depending on the normalization method you choose, the values in the bins are either transformed to percentiles or mapped to a bin number. The following diagram shows the distribution of numeric values before and after binning with the quantiles method.


1 Answers

library(tidyverse)
library(lubridate)  
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date

df <- read_delim("https://gist.githubusercontent.com/ramiromagno/4347eefec2aa36ec94423b75b145fccb/raw/5c1b445686bd014ea3a1f0336433e3b364779766/df.txt", delim = " ", col_types = cols())

df$DeviceTime <- as.POSIXct(paste(df$Date, df$Time), format="%Y-%m-%d %H:%M:%S")

dt <- seq(
  from = as.POSIXct("2019-02-12 13:24:00", tz = "GMT"),
  to = as.POSIXct("2019-02-12 14:00:00", tz = "GMT"),
  by = "10 sec"
)

df %>% 
  mutate(BinnedTime = cut(DeviceTime, breaks=dt)) %>%
  group_by(Sensor)%>%
  group_by(BinnedTime,add=TRUE)%>%
  summarize(calCO2 = mean(na.omit(calCO2))) -> df2

df2
#> # A tibble: 7,557 x 3
#> # Groups:   Sensor [?]
#>    Sensor BinnedTime          calCO2
#>    <chr>  <fct>                <dbl>
#>  1 A1     2019-02-12 13:24:00    400
#>  2 A1     2019-02-12 13:24:10    401
#>  3 A1     2019-02-12 13:24:20    401
#>  4 A1     2019-02-12 13:24:30    401
#>  5 A1     2019-02-12 13:24:40    401
#>  6 A1     2019-02-12 13:24:50    400
#>  7 A1     2019-02-12 13:25:00    400
#>  8 A1     2019-02-12 13:25:10    398
#>  9 A1     2019-02-12 13:25:20    397
#> 10 A1     2019-02-12 13:25:30    394
#> # ... with 7,547 more rows
like image 58
plant Avatar answered Nov 15 '22 11:11

plant