Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Looping through variable names in R

Tags:

r

stata

I'm having a looping issue. It should be simple to solve, but "R for Stata Users" (I've coded in Stata for a couple of years), Roger Peng's videos, and Google don't seem to be helping me. Can one of you please explain to me what I'm doing wrong?

I'm trying to write a loop that run through the 'thresholds' dataframe to pull out information from three sets of columns. I can do what I want to do by writing the same segment of code three times, but as the code gets more complicated, this will become quite cumbersome.

Here is a sample of 'thresholds' (see dput output below, added by a friendly reader):

    threshold_1_name      threshold_1_dir threshold_1_value
1   overweight            >                25
2   possible malnutrition <                31
3   Q1                    >                998
4   Q1                    >                998
5   Q1                    >                998
6   Q1                    >                998
    threshold_1_units threshold_2_name threshold_2_dir threshold_2_value threshold_2_units
1   kg/m^2            obese               >             30                kg/m^2
2   cm                <NA>                >             NA                   
3   <NA>              Q3                  >             998                  
4                     Q3                  >             998                  
5                     Q3                  >             998                  
6                     Q3                  >             998  

This code does what I want to do:

newvars1 <- paste(thresholds$varname, thresholds$threshold_1_name, sep = "_")
noval <- is.na(thresholds$threshold_1_value)
newvars1 <- newvars1[!noval]

newvars2 <- paste(thresholds$varname, thresholds$threshold_2_name, sep = "_")
noval <- is.na(thresholds$threshold_2_value)
newvars2 <- newvars2[!noval]

newvars3 <- paste(thresholds$varname, thresholds$threshold_3_name, sep = "_")
noval <- is.na(thresholds$threshold_3_value)
newvars3 <- newvars3[!noval]

And here is how I am trying to loop:

variables <- NULL
for (i in 1:3) {
  valuevar <- paste("threshold", i, "value", sep = "_")
  namevar <- paste("threshold", i, "name", sep = "_")
  newvar <- paste("varnames", i, sep = "")
  for (j in 1:length(thresholds$varname)) { 
    check <- is.na(thresholds[valuevar[j]])
    if (check == FALSE) {
      newvars <- paste(thresholds$varname, thresholds[namevar], sep = "_")
    }
  }
  variables <- c(variables, newvars)
}

And here is the error I am receiving:

Error: unexpected '}' in "}"

I think something about the way I am calling the 'i' is messing things up, but I'm not sure how to do it correctly. My Stata habits using locals are really biting me in the butt as I switch to R.

EDIT to add dput output, by a friendly reader:

thresholds <- structure(list(varname = structure(1:6, .Label = c("varA", "varB", 
"varC", "varD", "varE", "varF"), class = "factor"), threshold_1_name = c("overweight", 
"possible malnutrition", "Q1", "Q1", "Q1", "Q1"), threshold_1_dir = c(">", 
"<", ">", ">", ">", ">"), threshold_1_value = c(25L, 31L, 998L, 
998L, 998L, 998L), threshold_1_units = c("kg/m^2", "cm", NA, 
NA, NA, NA), threshold_2_name = c("obese", "<NA>", "Q3", "Q3", 
"Q3", "Q3"), threshold_2_dir = c(">", ">", ">", ">", ">", ">"
), threshold_2_value = c(30L, NA, 998L, 998L, 998L, 998L), threshold_2_units = c("kg/m^2", 
"cm", NA, NA, NA, NA)), .Names = c("varname", "threshold_1_name", 
"threshold_1_dir", "threshold_1_value", "threshold_1_units", 
"threshold_2_name", "threshold_2_dir", "threshold_2_value", "threshold_2_units"
), row.names = c(NA, -6L), class = "data.frame")
like image 824
Struggling_with_R Avatar asked Dec 21 '12 21:12

Struggling_with_R


People also ask

How do I print a list of variable names in R?

You can use ls() to list all variables that are created in the environment. Use ls() to display all variables. pat = " " is used for pattern matching such as ^, $, ., etc. Hope it helps!

Can variable names have in R?

R supports rather long variable names and these names can contain even spaces and punctuation but short variables names make coding easier.

What are iterations in R?

Another tool for reducing duplication is iteration, which helps you when you need to do the same thing to multiple inputs: repeating the same operation on different columns, or on different datasets. In this chapter you'll learn about two important iteration paradigms: imperative programming and functional programming.


1 Answers

The first problem I see is in if(check = "FALSE") that's an assignment = if you're testing a condition it needs to be ==. Also, quoting the word "FALSE" means you're testing a variable for the string value (literally the word FALSE), not the logical value, which is FALSE without the quotations.

The second problem has been rightly pointed out by @BlueMagister, you're missing ) at the end of for (j in 1:length(...)) {

See # bad!

  for (j in 1:length(thresholds$varname)) { 
    check <- is.na(thresholds[valuevar[j]])
    if (check = "FALSE") { # bad!
      newvars <- paste(thresholds$varname, thresholds[namevar], sep = "_")
    }
  }

See # good!

  for (j in 1:length(thresholds$varname)) { 
    check <- is.na(thresholds[valuevar[j]])
    if (check == FALSE) { # good!
      newvars <- paste(thresholds$varname, thresholds[namevar], sep = "_")
    }
  }

But because it's an if statement you can use really simple logic, especially on logicals (TRUE / FALSE values).

See # better!

  for (j in 1:length(thresholds$varname)) { 
    check <- is.na(thresholds[valuevar[j]])
    if (!check) { # better!
      newvars <- paste(thresholds$varname, thresholds[namevar], sep = "_")
    }
  }
like image 129
Brandon Bertelsen Avatar answered Sep 30 '22 13:09

Brandon Bertelsen