I am running into an issue with my data where I want to take the first observed <code>ob</code> score <code>score</code> for each individual <code>id</code> and subtract that from that last observed <code>score</code>. The problem with asking for the first observation minus the last observation is that sometimes the first observation data is missing. Is there anyway to ask for the first observed score for each individual, thus skipping any missing data? I built the below df to illustrate my problem. <pre class="prettyprint"><code>help <- data.frame(id = c(5,5,5,5,5,12,12,12,17,17,20,20,20), ob = c(1,2,3,4,5,1,2,3,1,2,1,2,3), score = c(NA, 2, 3, 4, 3, 7, 3, 4, 3, 4, NA, 1, 4)) id ob score 1 5 1 NA 2 5 2 2 3 5 3 3 4 5 4 4 5 5 5 3 6 12 1 7 7 12 2 3 8 12 3 4 9 17 1 3 10 17 2 4 11 20 1 NA 12 20 2 1 13 20 3 4 </code></pre> And what I am hoping to run is code that will give me... <pre class="prettyprint"><code> id ob score es 1 5 1 NA -1 2 5 2 2 -1 3 5 3 3 -1 4 5 4 4 -1 5 5 5 3 -1 6 12 1 7 3 7 12 2 3 3 8 12 3 4 3 9 17 1 3 -1 10 17 2 4 -1 11 20 1 NA -3 12 20 2 1 -3 13 20 3 4 -3 </code></pre> I am attempting to work out of dplyr and I understand the use of the 'group_by' command, however, not sure how to 'select' only first observed scores and then mutate to create <code>es</code>.

I would use <code>first()</code> and <code>last()</code> (both <code>dplyr</code> function) and <code>na.omit()</code> (from the default stats package. First, I would make sure your score column was a numberic column with proper NA values (not strings as in your example) <pre class="prettyprint"><code>help <- data.frame(id = c(5,5,5,5,5,12,12,12,17,17,20,20,20), ob = c(1,2,3,4,5,1,2,3,1,2,1,2,3), score = c(NA, 2, 3, 4, 3, 7, 3, 4, 3, 4, NA, 1, 4)) </code></pre> then you can do <pre class="prettyprint"><code>library(dplyr) help %>% group_by(id) %>% arrange(ob) %>% mutate(es=first(na.omit(score)-last(na.omit(score)))) </code></pre>

<pre class="prettyprint"><code>library(dplyr) temp <- help %>% group_by(id) %>% arrange(ob) %>% filter(!is.na(score)) %>% mutate(es = first(score) - last(score)) %>% select(id, es) %>% distinct() help %>% left_join(temp) </code></pre>

Select first observed data and utilize mutate

Tags:

r

dplyr

I am running into an issue with my data where I want to take the first observed ob score score for each individual id and subtract that from that last observed score.

The problem with asking for the first observation minus the last observation is that sometimes the first observation data is missing.

Is there anyway to ask for the first observed score for each individual, thus skipping any missing data?

I built the below df to illustrate my problem.

help <- data.frame(id = c(5,5,5,5,5,12,12,12,17,17,20,20,20),
                   ob = c(1,2,3,4,5,1,2,3,1,2,1,2,3),
                   score = c(NA, 2, 3, 4, 3, 7, 3, 4, 3, 4, NA, 1, 4))

   id ob score
1   5  1    NA
2   5  2     2
3   5  3     3
4   5  4     4
5   5  5     3
6  12  1     7
7  12  2     3
8  12  3     4
9  17  1     3
10 17  2     4
11 20  1    NA
12 20  2     1
13 20  3     4

And what I am hoping to run is code that will give me...

   id ob score  es
1   5  1    NA  -1
2   5  2     2  -1
3   5  3     3  -1
4   5  4     4  -1
5   5  5     3  -1
6  12  1     7   3
7  12  2     3   3
8  12  3     4   3
9  17  1     3  -1
10 17  2     4  -1
11 20  1    NA  -3
12 20  2     1  -3
13 20  3     4  -3

I am attempting to work out of dplyr and I understand the use of the 'group_by' command, however, not sure how to 'select' only first observed scores and then mutate to create es.

780

asked Jun 11 '15 17:06

b222

2 Answers

I would use first() and last() (both dplyr function) and na.omit() (from the default stats package.

First, I would make sure your score column was a numberic column with proper NA values (not strings as in your example)

help <- data.frame(id = c(5,5,5,5,5,12,12,12,17,17,20,20,20),
       ob = c(1,2,3,4,5,1,2,3,1,2,1,2,3),
       score = c(NA, 2, 3, 4, 3, 7, 3, 4, 3, 4, NA, 1, 4))

then you can do

library(dplyr)
help %>% group_by(id) %>% arrange(ob) %>% 
    mutate(es=first(na.omit(score)-last(na.omit(score))))

171

answered Oct 05 '22 18:10

MrFlick

library(dplyr)

temp <- help %>% group_by(id) %>% 
     arrange(ob) %>%
     filter(!is.na(score)) %>% 
     mutate(es = first(score) - last(score)) %>%
     select(id, es) %>%
     distinct()

help %>% left_join(temp)

answered Oct 05 '22 17:10

Yifei

Related questions
                            
                                How to save plots inside a folder?
                            
                                How Can I Quickly Inspect Built-in Data Sets (PSA)?
                            
                                Why can't pass only 1 coulmn to glmnet when it is possible in glm function in R?
                            
                                Error connecting to azure blob storage API from R
                            
                                Text labels with outline in R
                            
                                Load large data to R data.table from Postgresql
                            
                                ifelse & grepl commands when using dplyr for SQL in-db operations
                            
                                Use Roxygen to make S3method in NAMESPACE
                            
                                ggplot2_Error: geom_point requires the following missing aesthetics: y
                            
                                gather_ does not work. Shouldn't quoting and ~ing have the same effect in standard evaluation mode?
                            
                                How to make fig.width and out.width consistent with knitr?
                            
                                Insert number above the diagonal in R matrix
                            
                                Confusing issue on multi-key subsetting data.table within function
                            
                                R formula with as.factor(): any way to specify the argument as a variable content instead of directly by name?
                            
                                Change list items to name R
                            
                                How to store the console output to a variable in R
                            
                                Add a new element to list of lists (in R)
                            
                                Transposing a data.table with reshape2:::dcast
                            
                                R pairs function - How to change the diagonal values font size?
                            
                                How to Detect and Mark Change within a Column in Another Column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With