I think this is the best way to describe what I want to do:
df$column <- ifelse(is.na(df$column) == TRUE, 0, 1)
But where column is dynamic. This is because I have about 45 columns all with the same kind of content, and all I want to do is check each cell, replace it with a 1 if there's something in it, a 0 if not. I have of course tried many different things, but since there seems to be no df[index][column] in R, I'm lost. I'd have expected something like this to work, but nope:
for (index in df) {
for (column in names(df)) {
df[[index]][[column]] <- ifelse(is.na(df[[index]][[column]]) == TRUE, 0, 1)
}
}
I could do this quickly in other languages (or even Excel), but I'm just learning R and want to understand why something so simple seems to be so complicated in a language that's meant to work with data. Thanks!
How about this:
df.new = as.data.frame(lapply(df, function(x) ifelse(is.na(x), 0, 1)))
lapply
applies a function to each column of the data frame df
. In this case, the function does the 0/1 replacement. lapply
returns a list. Wrapping it in as.data.frame
converts the list to a data frame (which is a special type of list).
In R
you can often replace a loop with one of the *apply
family of functions. In this case, lapply
"loops" over the columns of the data frame. Also, many R
functions are "vectorized" meaning the function operates on every value in a vector at once. In this case, ifelse
does the replacement on an entire column of the data frame.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With