Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I write an R script to check for straight-lining; i.e., whether, for any given row, all values in a set of columns have the same value

I would like to create a dichotomous variable that tells me whether a participant gave the same response to each of 10 questions. Each row is a participant and I want to write a simple script to create this new variable/vector in my data frame. For example, if my data looks like the first 6 columns, then I'm trying to create the 7th one.

ID   Item1  Item2  Item3  Item4  Item5  | AllSame
1    5      5      5      5      5      | Yes
2    1      3      3      3      2      | No
3    2      2      2      2      2      | Yes
4    5      4      5      5      5      | No
5    5      2      3      5      5      | No

I've seen solutions on this set that compare one column to another, for example here with ifelse(data$item1==data$item2,1,ifelse(dat$item1==data$item3,0,NA)), but I have 10 columns in my actual dataset and I figure there's got to be a better way than checking all 10 against each other. I also could create a a variable that counts how many equal 1, and then do a test for if the count is the same as the number of columns, but with 7 possible responses in the data once again this is looking very unweildy and I'm hoping someone has a better solution. Thank you!

like image 668
Bofstein Avatar asked Jun 22 '16 23:06

Bofstein


People also ask

How do you check if all values in a column are the same in R?

You can use the duplicated function for this: if sum(! duplicated(x[,1]))==1 returns TRUE the column contains all identical values.

How do you define a column in R?

The column items in a data frame in R can be accessed using: Single brackets [] , which would display them as a column. Double brackets [[]] , which would display them as a list.

How do I add a new variable to an existing dataset in R?

To create a new variable or to transform an old variable into a new one, usually, is a simple task in R. The common function to use is newvariable <- oldvariable . Variables are always added horizontally in a data frame.


2 Answers

There will be many ways of doing this, but here is one

mydf <- data.frame(Item1 = c(5,1,2,5,5), 
                   Item2 = c(5,3,2,4,2), 
                   Item3 = c(5,3,2,5,3), 
                   Item4 = c(5,3,2,5,5),
                   Item5 = c(5,3,2,5,5) )

mydf$AllSame <- rowMeans(mydf[,1:5] == mydf[,1]) == 1

which leads to

> mydf
  Item1 Item2 Item3 Item4 Item5 AllSame
1     5     5     5     5     5    TRUE
2     1     3     3     3     3   FALSE
3     2     2     2     2     2    TRUE
4     5     4     5     5     5   FALSE
5     5     2     3     5     5   FALSE

And if you really must have "Yes" and "No" then use instead something like

mydf$AllSame <- ifelse(rowMeans(mydf[,1:5] == mydf[,1]) == 1, "Yes", "No")
like image 95
Henry Avatar answered Sep 17 '22 21:09

Henry


Henry has posted a short and fast working solution that has already been accepted. I still wanted to add this alternative, which in my opinion has a slight advantage in readability:

mydf <- data.frame(Item1 = c(5,1,2,5,5), 
                   Item2 = c(5,3,2,4,2), 
                   Item3 = c(5,3,2,5,3), 
                   Item4 = c(5,3,2,5,5),
                   Item5 = c(5,3,2,5,5) )

mydf$AllSame <- apply(mydf, 1, function(row) all(row==row[1]))

The all() functions used here has a na.rm argument which can easily be set to TRUE, if you want NAs to be neglected.

like image 40
Bernhard Avatar answered Sep 17 '22 21:09

Bernhard