I have CO2 measurement data by 30 sensors that don't all measure at the same time, nor do they all start at exactly the same time. I would like to align them as best as possible, so I thought that taking 10s averages might be a good solution.
In a previous question: Group by multiple variables and summarise dplyr I cut the time in 10s chunks for each sensor and averaged each sensors reading over those 10s. Sounds OK but, what I've realised is that the following code cuts the time from whatever time each sensor starts at and therefore they are still not aligned. How can I align them?
require(tidyverse)
require(lubridate)
df %>%
group_by(Sensor, BinnedTime = cut(DeviceTime, breaks="10 sec")) %>%
mutate(Concentration = mean(calCO2)) %>%
ungroup()
head(df)
# A tibble: 6 x 7
# Groups: BinnedTime [1]
Sensor Date Time calCO2 DeviceTime cuts BinnedTime
<fctr> <date> <time> <dbl> <dttm> <fctr> <chr>
1 N1 2019-02-12 13:24 400 2019-02-12 13:24:02 (0,10] 2019-02-12 13:24:02
2 N1 2019-02-12 13:24 400 2019-02-12 13:24:02 (0,10] 2019-02-12 13:24:02
3 N1 2019-02-12 13:24 400 2019-02-12 13:24:03 (0,10] 2019-02-12 13:24:03
4 N2 2019-02-12 13:24 400 2019-02-12 13:24:03 (0,10] 2019-02-12 13:24:02
5 N3 2019-02-12 13:24 400 2019-02-12 13:24:03 (0,10] 2019-02-12 13:24:02
6 N3 2019-02-12 13:24 400 2019-02-12 13:24:05 (0,10] 2019-02-12 13:24:04
I've tried:
dt<-seq(
from=as.POSIXct("2019-02-12 13:24:00", tz="GMT"),
to=as.POSIXct("2019-02-12 14:00:00", tz="GMT"),
by="10 sec"
)
cut(df$BinnedTime,dt)
but it gives an error saying x must be numeric, so I converted both df$BinnedTime
and dt$dt
to numeric and this produces only NAs.
cut(as.numeric(as.POSIXct(df$BinnedTime)), as.numeric(dt))
What am I missing?
I have the following:
df$DeviceTime <- as.POSIXct(paste(d$Date, d$Time), format="%Y-%m-%d %H:%M:%S")
df<-df%>%
mutate(BinnedTime=floor_date(ymd_hms(DeviceTime),unit="10 sec"))%>%
group_by(Sensor)%>%
group_by(BinnedTime,add=TRUE)%>%
summarize(calCO2 = mean(na.omit(calCO2)))
Which I think is now what I'm after but it's not elegant.
Here is the data file in onedrive: df.txt until 30th March 19
Grouping data by time intervals is very obvious when you come across Time-Series Analysis. A time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time.
The Group Data into Bins component supports multiple options for binning data. You can customize how the bin edges are set and how values are apportioned into the bins. For example, you can: Manually type a series of values to serve as the bin boundaries. Assign values to bins by using quantiles, or percentile ranks.
The values from the data column are placed in the bins such that each bin has the same interval between starting and ending values. As a result, some bins might have more values if data is clumped around a certain point. Custom Edges: You can specify the values that begin each bin.
Then, depending on the normalization method you choose, the values in the bins are either transformed to percentiles or mapped to a bin number. The following diagram shows the distribution of numeric values before and after binning with the quantiles method.
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
df <- read_delim("https://gist.githubusercontent.com/ramiromagno/4347eefec2aa36ec94423b75b145fccb/raw/5c1b445686bd014ea3a1f0336433e3b364779766/df.txt", delim = " ", col_types = cols())
df$DeviceTime <- as.POSIXct(paste(df$Date, df$Time), format="%Y-%m-%d %H:%M:%S")
dt <- seq(
from = as.POSIXct("2019-02-12 13:24:00", tz = "GMT"),
to = as.POSIXct("2019-02-12 14:00:00", tz = "GMT"),
by = "10 sec"
)
df %>%
mutate(BinnedTime = cut(DeviceTime, breaks=dt)) %>%
group_by(Sensor)%>%
group_by(BinnedTime,add=TRUE)%>%
summarize(calCO2 = mean(na.omit(calCO2))) -> df2
df2
#> # A tibble: 7,557 x 3
#> # Groups: Sensor [?]
#> Sensor BinnedTime calCO2
#> <chr> <fct> <dbl>
#> 1 A1 2019-02-12 13:24:00 400
#> 2 A1 2019-02-12 13:24:10 401
#> 3 A1 2019-02-12 13:24:20 401
#> 4 A1 2019-02-12 13:24:30 401
#> 5 A1 2019-02-12 13:24:40 401
#> 6 A1 2019-02-12 13:24:50 400
#> 7 A1 2019-02-12 13:25:00 400
#> 8 A1 2019-02-12 13:25:10 398
#> 9 A1 2019-02-12 13:25:20 397
#> 10 A1 2019-02-12 13:25:30 394
#> # ... with 7,547 more rows
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With