I currently have a data that looks like this for multiple ids (that range until around 1600) <pre class="prettyprint"><code>id year name status 1 1980 James 3 1 1981 James 3 1 1982 James 3 1 1983 James 4 1 1984 James 4 1 1985 James 1 1 1986 James 1 1 1987 James 1 2 1982 John 2 2 1983 John 2 2 1984 John 1 2 1985 John 1 </code></pre> I want to subset this data so that it only has the information for status=1 and the status right before that. I also want to eliminate multiple 1s and only save the first 1s. In conclusion I would want: <pre class="prettyprint"><code>id year name status 1 1984 James 4 1 1985 James 1 2 1983 John 2 2 1984 John 1 </code></pre> I'm doing this because I'm in the process of figuring out in what year how many people from certain status changed to status 1. I only know the subset command and I don't think I can get this data from doing <code>subset(data, subset=(status==1))</code>. How could I save the information right before that I want to add to this question one more time - I did not get same results when I applied the first reply to this question (which uses plr packages) and the third reply which uses duplicated command. I found out that the first reply preserved information accurately while the third one did not.

This does what you want. <pre class="prettyprint"><code>library(plyr) ddply(d, .(name), function(x) { i <- match(1, x$status) if (is.na(i)) NULL else x[c(i-1, i), ] }) id year name status 1 1 1984 James 4 2 1 1985 James 1 3 2 1983 John 2 4 2 1984 John 1 </code></pre>

Here's a solution - for each grouping of numbers (the <code>cumsum</code> bit), it looks at the first one and takes that and the previous row if status is 1: <pre class="prettyprint"><code>library(data.table) dt = data.table(your_df) dt[dt[, if(status[1] == 1) c(.I[1]-1, .I[1]), by = cumsum(c(0,diff(status)!=0))]$V1] # id year name status #1: 1 1984 James 4 #2: 1 1985 James 1 #3: 2 1983 John 2 #4: 2 1984 John 1 </code></pre>

selecting rows with specific conditions in R

Tags:

r

row

I currently have a data that looks like this for multiple ids (that range until around 1600)

id  year    name    status
1   1980    James   3
1   1981    James   3
1   1982    James   3
1   1983    James   4
1   1984    James   4
1   1985    James   1
1   1986    James   1
1   1987    James   1
2   1982    John    2
2   1983    John    2
2   1984    John    1
2   1985    John    1

I want to subset this data so that it only has the information for status=1 and the status right before that. I also want to eliminate multiple 1s and only save the first 1s. In conclusion I would want:

id  year    name    status
1   1984    James   4
1   1985    James   1
2   1983    John    2
2   1984    John    1

I'm doing this because I'm in the process of figuring out in what year how many people from certain status changed to status 1. I only know the subset command and I don't think I can get this data from doing subset(data, subset=(status==1)). How could I save the information right before that

I want to add to this question one more time - I did not get same results when I applied the first reply to this question (which uses plr packages) and the third reply which uses duplicated command. I found out that the first reply preserved information accurately while the third one did not.

796

asked Jan 17 '14 19:01

song0089

2 Answers

This does what you want.

library(plyr)

ddply(d, .(name), function(x) {
  i <- match(1, x$status)
  if (is.na(i))
    NULL
  else
    x[c(i-1, i), ]
})

  id year  name status
1  1 1984 James      4
2  1 1985 James      1
3  2 1983  John      2
4  2 1984  John      1

128

answered Sep 28 '22 02:09

Mark Heckmann

Here's a solution - for each grouping of numbers (the cumsum bit), it looks at the first one and takes that and the previous row if status is 1:

library(data.table)
dt = data.table(your_df)

dt[dt[, if(status[1] == 1) c(.I[1]-1, .I[1]),
        by = cumsum(c(0,diff(status)!=0))]$V1]
#   id year  name status
#1:  1 1984 James      4
#2:  1 1985 James      1
#3:  2 1983  John      2
#4:  2 1984  John      1

answered Sep 28 '22 02:09

eddi

Related questions
                            
                                How do I substitute symbols in a language object?
                            
                                Cannot get rid of one-pixel-wide white margins (bottom and right side) of plot
                            
                                read.table with comma separated values and also commas inside each element
                            
                                How to remove an element in NumericVector for a recursion using R and Rcpp
                            
                                Multiple colour scales in one stacked bar plot using ggplot
                            
                                Split string on first two colons
                            
                                How can I view the source code for a particular `predict` function? [duplicate]
                            
                                How can I plot a function in R with complex numbers?
                            
                                How to fix the geom_text label position so it is always on the middle of the plot?
                            
                                Grouped horizontal boxplot with bwplot
                            
                                R clip raster with multiple bands
                            
                                Legend for Random Forest Plot in R
                            
                                Combining first two columns and turn it into row names in R data.frame
                            
                                How to add clustering rectangle in hierarchical heatmap dendogram
                            
                                How to update existing column values in data.table?
                            
                                Python scipy chisquare returns different values than R chisquare
                            
                                How do I convert a n*1 matrix to a n*n diagonal matrix
                            
                                ggplot Multi line plot from same dataframe
                            
                                Filter data.table by multiple columns, dynamically
                            
                                by() giving error when applying mean function over a data frame. What's happening?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With