Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create index from group to select value from original data.frame to use in result

Tags:

r

dplyr

I have a data.frame df. I want to create a new variable using the output from summarize as the index to retrieve the value from a column in the original data.frame.

df.l has the following columns trial, location, posi, date, and value.

I want to use the the sum of "value==1" for each group(trial, location,date) as an index from which to select the value from posi and store it as new variable.

value indf.l can be 1 or 0 (once it becomes zero it remains so, as long as its ordered correctly, i.e. posi 0 - 1). This grouped sum indicates where value changes from 1 to 0 within the group.

To determine the index location the following code works:

test <- df.l %>% 
  group_by(trial, location, date) %>%
  summarise(n= sum(value==1))

but of course, posi is missing.

I was hoping that something like the code below would work, but it doesn't. It starts out with correct results, but somewhere the indexing goes awry. I don't know if it make sense to call a column like I did.

test <- df.l %>% 
  group_by(trial, location, date) %>%
  summarise(n= sum(value==1)) %>%
  mutate(ANS = nth(df.l$posi,n))

Using dplyr can I create an "index" from a group to select a value from the original data.frame, and then add this variable to the new data.frame? Or, is there another approach using dplyr to achieve the same results?

# truncated data.frame
df.l <- structure(list(trial = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
    location = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 
    3L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), posi = c(0, 
    0.28, 0.65, 1, 0, 0.33, 0.67, 1, 0, 0.2, 0.5, 1, 0, 0.28, 
    0.65, 1, 0, 0.33, 0.67, 1, 0, 0.2, 0.5, 1), date = c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), value = c(1L, 1L, 1L, 0L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 
    1L, 1L, 1L, 0L, 0L)), .Names = c("trial", "location", "posi", "date", "value"), row.names = c(NA, 24L), class = "data.frame")

    #desired result
    result <- structure(list(trial = c(1L, 1L, 1L, 2L, 2L, 2L), location = c(1L, 
2L, 3L, 1L, 2L, 3L), date = c(1L, 1L, 1L, 1L, 1L, 1L), n = c(3L, 
4L, 4L, 1L, 4L, 2L), posi = c(0.65, 1, 1, 0, 1, 0.2)), class = "data.frame", .Names = c("trial", 
"location", "date", "n", "posi"), row.names = c(NA, -6L))
like image 556
nofunsally Avatar asked Jul 30 '15 19:07

nofunsally


2 Answers

You can do it inside the summarise:

df.l %>% 
    group_by(trial, location, date) %>%
    summarise(n= sum(value==1), ANS = nth(posi,n))
#Source: local data frame [6 x 5]
#Groups: trial, location
#
#  trial location date n  ANS
#1     1        1    1 3 0.65
#2     1        2    1 4 1.00
#3     1        3    1 4 1.00
#4     2        1    1 1 0.00
#5     2        2    1 4 1.00
#6     2        3    1 2 0.20

Or, if you don't actually need the n in the result, you could do

df.l %>% 
    group_by(trial, location, date) %>%
    summarise(ANS = nth(posi, sum(value == 1)))

Or

df.l %>% 
    group_by(trial, location, date) %>%
    summarise(ANS = posi[sum(value == 1)])
like image 177
talat Avatar answered Oct 01 '22 01:10

talat


slice seems like the most natural option here:

df.l %>% group_by(trial,location,date) %>% mutate(n=row_number()) %>% slice(sum(value))

This gives

  trial location posi date value n
1     1        1 0.65    1     1 3
2     1        2 1.00    1     1 4
3     1        3 1.00    1     1 4
4     2        1 0.00    1     1 1
5     2        2 1.00    1     1 4
6     2        3 0.20    1     1 2

The slice function selects one or more rows according to their indices (within a group if applicable), exactly as the OP describes.

like image 45
Frank Avatar answered Oct 01 '22 03:10

Frank