Reshaping data in R with "login" "logout" times

Question

I'm new to R, and am working on a side project for my own purposes. I have this data (reproducable dput of this is at the end of the question):

     X            datetime  user  state
1    1 2016-02-19 19:13:26 User1 joined
2    2 2016-02-19 19:21:18 User2 joined
3    3 2016-02-19 19:21:33 User1 joined
4    4 2016-02-19 19:35:38 User1 joined
5    5 2016-02-19 19:44:15 User1 joined
6    6 2016-02-19 19:48:55 User1 joined
7    7 2016-02-19 19:52:40 User1 joined
8    8 2016-02-19 19:53:15 User3 joined
9    9 2016-02-19 20:02:34 User3 joined
10  10 2016-02-19 20:13:48 User3 joined
19 637 2016-02-19 19:13:32 User1   left
20 638 2016-02-19 19:25:26 User1   left
21 639 2016-02-19 19:30:30 User2   left
22 640 2016-02-19 19:42:16 User1   left
23 641 2016-02-19 19:47:59 User1   left
24 642 2016-02-19 19:51:06 User1   left
25 643 2016-02-19 20:02:26 User3   left

I want it to look like this:

    user  joined                left
1   User1 2016-02-19 19:13:26   2016-02-19 19:13:32
2   User2 2016-02-19 19:21:18   2016-02-19 19:30:30
3   User3 2016-02-19 19:53:15   2016-02-19 20:02:26 
4   User1 2016-02-19 19:21:33   2016-02-19 19:25:26
.
.
.

I'm looking at tidyr as there's some reshaping involved obviously, but I can't wrap my head around what exactly needs to be done. Is this even possible (without looping/massive amounts of procedural code)? The problem I can't grasp how to get around is that there's no way to know that a particular "left" record should be joined to a particular "joined" record. Examples I can find all involve a static month or day over which other values are gathered. I should add that it's not necessarily guaranteed that all records are guaranteed to have a "left" value (a user might still be "joined").

Here's the dput output of a sample of the data:

> dput(samp)
structure(list(X = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 
11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 637L, 638L, 639L, 640L, 
641L, 642L, 643L, 644L, 645L, 646L, 647L, 648L, 649L, 650L, 651L
), datetime = structure(c(1L, 3L, 4L, 7L, 9L, 11L, 13L, 14L, 
16L, 18L, 21L, 22L, 23L, 26L, 27L, 30L, 32L, 33L, 2L, 5L, 6L, 
8L, 10L, 12L, 15L, 17L, 19L, 20L, 24L, 25L, 28L, 29L, 31L), .Label = c("2016-02-19 19:13:26", 
"2016-02-19 19:13:32", "2016-02-19 19:21:18", "2016-02-19 19:21:33", 
"2016-02-19 19:25:26", "2016-02-19 19:30:30", "2016-02-19 19:35:38", 
"2016-02-19 19:42:16", "2016-02-19 19:44:15", "2016-02-19 19:47:59", 
"2016-02-19 19:48:55", "2016-02-19 19:51:06", "2016-02-19 19:52:40", 
"2016-02-19 19:53:15", "2016-02-19 20:02:26", "2016-02-19 20:02:34", 
"2016-02-19 20:13:38", "2016-02-19 20:13:48", "2016-02-19 20:42:27", 
"2016-02-19 20:48:22", "2016-02-19 20:49:31", "2016-02-19 20:59:58", 
"2016-02-19 21:06:20", "2016-02-19 21:10:43", "2016-02-19 21:11:13", 
"2016-02-19 21:11:15", "2016-02-19 21:11:22", "2016-02-19 21:17:33", 
"2016-02-19 22:02:45", "2016-02-19 22:05:18", "2016-02-19 22:05:37", 
"2016-02-19 22:05:47", "2016-02-19 22:30:30"), class = "factor"), 
    user = structure(c(1L, 2L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 
    3L, 4L, 1L, 1L, 4L, 4L, 4L, 3L, 1L, 1L, 2L, 1L, 1L, 1L, 3L, 
    3L, 3L, 1L, 4L, 1L, 1L, 4L, 4L), .Label = c("User1", "User2", 
    "User3", "User4"), class = "factor"), state = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L), .Label = c("joined", "left"), class = "factor")), .Names = c("X", 
"datetime", "user", "state"), class = "data.frame", row.names = c(NA, 
-33L))

Arun · Accepted Answer

Using rowid() from the data.table-package along with dcast:

require(data.table)
dcast(dt, user + rowid(user, state) ~ state, value.var="datetime")

#      user user_1              joined                left
#  1: User1      1 2016-02-19 19:13:26 2016-02-19 19:13:32
#  2: User1      2 2016-02-19 19:21:33 2016-02-19 19:25:26
#  3: User1      3 2016-02-19 19:35:38 2016-02-19 19:42:16
#  4: User1      4 2016-02-19 19:44:15 2016-02-19 19:47:59
#  5: User1      5 2016-02-19 19:48:55 2016-02-19 19:51:06
#  6: User1      6 2016-02-19 19:52:40                <NA>
#  7: User2      1 2016-02-19 19:21:18 2016-02-19 19:30:30
#  8: User3      1 2016-02-19 19:53:15 2016-02-19 20:02:26
#  9: User3      2 2016-02-19 20:02:34                <NA>
# 10: User3      3 2016-02-19 20:13:48                <NA>

SymbolixAU · Answer

We can make use of the order of "left" and "joined", and match when one follows the other for each user.

For this I'm going to use library(data.table)

library(data.table)
setDT(df)

## order the data by user and datetime
df <- df[order(user, datetime)]
## add an 'order' column, which is a sequence from 1 to lenght()  
## for each user
df[, order := seq(1:.N), by=user]

## split the left and joins
dt_left <- df[state == "left"]
dt_joined <- df[state == "joined"]

## assuming 'left' is after 'joined', shift the 'order' back for left
dt_left[, order := order - 1]

## join user an dorder (and subsetting relevant columns) 
## keeping when there's a 'joined' but not a 'left'
dt <- dt_left[, .(user, order, datetime)][dt_joined[, .(user, order, datetime)], on=c("user", "order"), nomatch=NA]

## rename columns
setnames(dt, c("datetime", "i.datetime"), c("left", "joined"))

     user order                left              joined
 1: User1     1 2016-02-19 19:13:32 2016-02-19 19:13:26
 2: User1     3 2016-02-19 19:25:26 2016-02-19 19:21:33
 3: User1     5 2016-02-19 19:42:16 2016-02-19 19:35:38
 4: User1     7 2016-02-19 19:47:59 2016-02-19 19:44:15
 5: User1     9 2016-02-19 19:51:06 2016-02-19 19:48:55
 6: User1    11 2016-02-19 20:48:22 2016-02-19 19:52:40
 7: User1    13 2016-02-19 21:11:13 2016-02-19 21:06:20
 8: User1    15 2016-02-19 21:17:33 2016-02-19 21:11:15
 9: User2     1 2016-02-19 19:30:30 2016-02-19 19:21:18
10: User3     1 2016-02-19 20:02:26 2016-02-19 19:53:15
11: User3     3 2016-02-19 20:13:38 2016-02-19 20:02:34
12: User3     5 2016-02-19 20:42:27 2016-02-19 20:13:48
13: User3     7                  NA 2016-02-19 20:49:31
14: User3     8                  NA 2016-02-19 22:30:30
15: User4     1 2016-02-19 21:10:43 2016-02-19 20:59:58
16: User4     3 2016-02-19 22:02:45 2016-02-19 21:11:22
17: User4     5 2016-02-19 22:05:37 2016-02-19 22:05:18
18: User4     7                  NA 2016-02-19 22:05:47

thelatemail · Answer

Base version:

samp$count <- with(samp, ave(as.character(user),list(state,user),FUN=seq_along) )

out <- merge(
  samp[samp$state=="joined",c("user","datetime","count")],
  samp[samp$state=="left",c("user","datetime","count")],
  by=c("user","count"), all.x=TRUE
)

out[order(out$count),]

ytk · Answer

Another way to do it:

library(tidyr)
df <- df %>% spread(state, datetime)

df_joined <- df[!is.na(df$joined), 2:3]
df_joined <- df_joined[with(df_joined, order(user, joined)), ]

df_left <- df[!is.na(df$left), c(2, 4)]
df_left <- df_left[with(df_left, order(user, left)), ]

merge(df_joined, df_left, all = TRUE, by = 'user')

Reshaping data in R with "login" "logout" times

Tags:

r

tidyr

Tim Coker

4 Answers

Arun

SymbolixAU

thelatemail

ytk

Recent Activity

Donate For Us

Reshaping data in R with "login" "logout" times

Tags:

r

tidyr

Tim Coker

4 Answers

Arun

SymbolixAU

thelatemail

ytk

Related questions

Recent Activity

Donate For Us