I'm having a looping issue. It should be simple to solve, but "R for Stata Users" (I've coded in Stata for a couple of years), Roger Peng's videos, and Google don't seem to be helping me. Can one of you please explain to me what I'm doing wrong?
I'm trying to write a loop that run through the 'thresholds' dataframe to pull out information from three sets of columns. I can do what I want to do by writing the same segment of code three times, but as the code gets more complicated, this will become quite cumbersome.
Here is a sample of 'thresholds' (see dput
output below, added by a friendly reader):
threshold_1_name threshold_1_dir threshold_1_value
1 overweight > 25
2 possible malnutrition < 31
3 Q1 > 998
4 Q1 > 998
5 Q1 > 998
6 Q1 > 998
threshold_1_units threshold_2_name threshold_2_dir threshold_2_value threshold_2_units
1 kg/m^2 obese > 30 kg/m^2
2 cm <NA> > NA
3 <NA> Q3 > 998
4 Q3 > 998
5 Q3 > 998
6 Q3 > 998
This code does what I want to do:
newvars1 <- paste(thresholds$varname, thresholds$threshold_1_name, sep = "_")
noval <- is.na(thresholds$threshold_1_value)
newvars1 <- newvars1[!noval]
newvars2 <- paste(thresholds$varname, thresholds$threshold_2_name, sep = "_")
noval <- is.na(thresholds$threshold_2_value)
newvars2 <- newvars2[!noval]
newvars3 <- paste(thresholds$varname, thresholds$threshold_3_name, sep = "_")
noval <- is.na(thresholds$threshold_3_value)
newvars3 <- newvars3[!noval]
And here is how I am trying to loop:
variables <- NULL
for (i in 1:3) {
valuevar <- paste("threshold", i, "value", sep = "_")
namevar <- paste("threshold", i, "name", sep = "_")
newvar <- paste("varnames", i, sep = "")
for (j in 1:length(thresholds$varname)) {
check <- is.na(thresholds[valuevar[j]])
if (check == FALSE) {
newvars <- paste(thresholds$varname, thresholds[namevar], sep = "_")
}
}
variables <- c(variables, newvars)
}
And here is the error I am receiving:
Error: unexpected '}' in "}"
I think something about the way I am calling the 'i' is messing things up, but I'm not sure how to do it correctly. My Stata habits using locals are really biting me in the butt as I switch to R.
EDIT to add dput
output, by a friendly reader:
thresholds <- structure(list(varname = structure(1:6, .Label = c("varA", "varB",
"varC", "varD", "varE", "varF"), class = "factor"), threshold_1_name = c("overweight",
"possible malnutrition", "Q1", "Q1", "Q1", "Q1"), threshold_1_dir = c(">",
"<", ">", ">", ">", ">"), threshold_1_value = c(25L, 31L, 998L,
998L, 998L, 998L), threshold_1_units = c("kg/m^2", "cm", NA,
NA, NA, NA), threshold_2_name = c("obese", "<NA>", "Q3", "Q3",
"Q3", "Q3"), threshold_2_dir = c(">", ">", ">", ">", ">", ">"
), threshold_2_value = c(30L, NA, 998L, 998L, 998L, 998L), threshold_2_units = c("kg/m^2",
"cm", NA, NA, NA, NA)), .Names = c("varname", "threshold_1_name",
"threshold_1_dir", "threshold_1_value", "threshold_1_units",
"threshold_2_name", "threshold_2_dir", "threshold_2_value", "threshold_2_units"
), row.names = c(NA, -6L), class = "data.frame")
You can use ls() to list all variables that are created in the environment. Use ls() to display all variables. pat = " " is used for pattern matching such as ^, $, ., etc. Hope it helps!
R supports rather long variable names and these names can contain even spaces and punctuation but short variables names make coding easier.
Another tool for reducing duplication is iteration, which helps you when you need to do the same thing to multiple inputs: repeating the same operation on different columns, or on different datasets. In this chapter you'll learn about two important iteration paradigms: imperative programming and functional programming.
The first problem I see is in if(check = "FALSE")
that's an assignment =
if you're testing a condition it needs to be ==
. Also, quoting the word "FALSE"
means you're testing a variable for the string value (literally the word FALSE), not the logical value, which is FALSE
without the quotations.
The second problem has been rightly pointed out by @BlueMagister, you're missing )
at the end of for (j in 1:length(...)) {
for (j in 1:length(thresholds$varname)) {
check <- is.na(thresholds[valuevar[j]])
if (check = "FALSE") { # bad!
newvars <- paste(thresholds$varname, thresholds[namevar], sep = "_")
}
}
for (j in 1:length(thresholds$varname)) {
check <- is.na(thresholds[valuevar[j]])
if (check == FALSE) { # good!
newvars <- paste(thresholds$varname, thresholds[namevar], sep = "_")
}
}
But because it's an if statement you can use really simple logic, especially on logicals (TRUE / FALSE values).
for (j in 1:length(thresholds$varname)) {
check <- is.na(thresholds[valuevar[j]])
if (!check) { # better!
newvars <- paste(thresholds$varname, thresholds[namevar], sep = "_")
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With