I have a dataset that looks like this toy example. The data describes the location a person has moved to and the time since this relocation happened. For example, person 1 started out in a rural area, but moved to a city 463 days ago (2nd row), and 415 days ago he moved from this city to a town (3rd row), etc.
set.seed(123)
df <- as.data.frame(sample.int(1000, 10))
colnames(df) <- "time"
df$destination <- as.factor(sample(c("city", "town", "rural"), size = 10, replace = TRUE, prob = c(.50, .25, .25)))
df$user <- sample.int(3, 10, replace = TRUE)
df[order(df[,"user"], -df[,"time"]), ]
The data:
time destination user
526 rural 1
463 city 1
415 town 1
299 city 1
179 rural 1
938 town 2
229 town 2
118 city 2
818 city 3
195 city 3
I wish to aggregate this data to the format below. That is, to count the types of relocations for each user, and sum it up to one matrix. How do I achieve this (preferably without writing loops)?
from to count
city city 1
city town 1
city rural 1
town city 2
town town 1
town rural 0
rural city 1
rural town 0
rural rural 0
The process involves two stages. First, collate individual cases of raw data together with a grouping variable. Second, perform which calculation you want on each group of cases.
R – Summary of Data Frame To get the summary of Data Frame, call summary() function and pass the Data Frame as argument to the function. We may pass additional arguments to summary() that affects the summary output. The output of summary() contains summary for each column.
summary statistic is computed using summary() function in R. summary() function is automatically applied to each column. The format of the result depends on the data type of the column. If the column is a numeric variable, mean, median, min, max and quartiles are returned.
One possible way based on data.table
package:
library(data.table)
cases <- unique(df$destination)
setDT(df)[, .(from=destination, to=shift(destination, -1)), by=user
][CJ(from=cases, to=cases), .(count=.N), by=.EACHI, on=c("from", "to")]
# from to count
# <char> <char> <int>
# 1: city city 1
# 2: city rural 1
# 3: city town 1
# 4: rural city 1
# 5: rural rural 0
# 6: rural town 0
# 7: town city 2
# 8: town rural 0
# 9: town town 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With