I currently have a data that looks like this for multiple ids (that range until around 1600)
id year name status
1 1980 James 3
1 1981 James 3
1 1982 James 3
1 1983 James 4
1 1984 James 4
1 1985 James 1
1 1986 James 1
1 1987 James 1
2 1982 John 2
2 1983 John 2
2 1984 John 1
2 1985 John 1
I want to subset this data so that it only has the information for status=1 and the status right before that. I also want to eliminate multiple 1s and only save the first 1s. In conclusion I would want:
id year name status
1 1984 James 4
1 1985 James 1
2 1983 John 2
2 1984 John 1
I'm doing this because I'm in the process of figuring out in what year how many people from certain status changed to status 1. I only know the subset command and I don't think I can get this data from doing subset(data, subset=(status==1))
. How could I save the information right before that
I want to add to this question one more time - I did not get same results when I applied the first reply to this question (which uses plr packages) and the third reply which uses duplicated command. I found out that the first reply preserved information accurately while the third one did not.
Select Rows Based on a List of Values. If you have a vector of values and you wanted to select rows based on a list of values (vector values) in R, use in operator %in% . The below example returns rows that have id values 13,14 and 15.
By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.
Similar to vectors, you can use the square brackets [ ] to select one or multiple elements from a matrix. Whereas vectors have one dimension, matrices have two dimensions. You should therefore use a comma to separate the rows you want to select from the columns.
This does what you want.
library(plyr)
ddply(d, .(name), function(x) {
i <- match(1, x$status)
if (is.na(i))
NULL
else
x[c(i-1, i), ]
})
id year name status
1 1 1984 James 4
2 1 1985 James 1
3 2 1983 John 2
4 2 1984 John 1
Here's a solution - for each grouping of numbers (the cumsum
bit), it looks at the first one and takes that and the previous row if status is 1:
library(data.table)
dt = data.table(your_df)
dt[dt[, if(status[1] == 1) c(.I[1]-1, .I[1]),
by = cumsum(c(0,diff(status)!=0))]$V1]
# id year name status
#1: 1 1984 James 4
#2: 1 1985 James 1
#3: 2 1983 John 2
#4: 2 1984 John 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With