Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

selecting rows with specific conditions in R

Tags:

r

row

I currently have a data that looks like this for multiple ids (that range until around 1600)

id  year    name    status
1   1980    James   3
1   1981    James   3
1   1982    James   3
1   1983    James   4
1   1984    James   4
1   1985    James   1
1   1986    James   1
1   1987    James   1
2   1982    John    2
2   1983    John    2
2   1984    John    1
2   1985    John    1

I want to subset this data so that it only has the information for status=1 and the status right before that. I also want to eliminate multiple 1s and only save the first 1s. In conclusion I would want:

id  year    name    status
1   1984    James   4
1   1985    James   1
2   1983    John    2
2   1984    John    1

I'm doing this because I'm in the process of figuring out in what year how many people from certain status changed to status 1. I only know the subset command and I don't think I can get this data from doing subset(data, subset=(status==1)). How could I save the information right before that

I want to add to this question one more time - I did not get same results when I applied the first reply to this question (which uses plr packages) and the third reply which uses duplicated command. I found out that the first reply preserved information accurately while the third one did not.

like image 796
song0089 Avatar asked Jan 17 '14 19:01

song0089


People also ask

How do I select a row with a condition in R?

Select Rows Based on a List of Values. If you have a vector of values and you wanted to select rows based on a list of values (vector values) in R, use in operator %in% . The below example returns rows that have id values 13,14 and 15.

How do I select a row with a specific value in R?

By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.

How do I select a specific row in a matrix in R?

Similar to vectors, you can use the square brackets [ ] to select one or multiple elements from a matrix. Whereas vectors have one dimension, matrices have two dimensions. You should therefore use a comma to separate the rows you want to select from the columns.


2 Answers

This does what you want.

library(plyr)

ddply(d, .(name), function(x) {
  i <- match(1, x$status)
  if (is.na(i))
    NULL
  else
    x[c(i-1, i), ]
})

  id year  name status
1  1 1984 James      4
2  1 1985 James      1
3  2 1983  John      2
4  2 1984  John      1
like image 128
Mark Heckmann Avatar answered Sep 28 '22 02:09

Mark Heckmann


Here's a solution - for each grouping of numbers (the cumsum bit), it looks at the first one and takes that and the previous row if status is 1:

library(data.table)
dt = data.table(your_df)

dt[dt[, if(status[1] == 1) c(.I[1]-1, .I[1]),
        by = cumsum(c(0,diff(status)!=0))]$V1]
#   id year  name status
#1:  1 1984 James      4
#2:  1 1985 James      1
#3:  2 1983  John      2
#4:  2 1984  John      1
like image 38
eddi Avatar answered Sep 28 '22 02:09

eddi