Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error - replacement has [x] rows, data has [y]

Tags:

dataframe

r

I have a numeric column ("value") in a dataframe ("df"), and I would like to generate a new column ("valueBin") based on "value." I have the following conditional code to define df$valueBin:

df$valueBin[which(df$value<=250)] <- "<=250" df$valueBin[which(df$value>250 & df$value<=500)] <- "250-500" df$valueBin[which(df$value>500 & df$value<=1000)] <- "500-1,000" df$valueBin[which(df$value>1000 & df$value<=2000)] <- "1,000 - 2,000" df$valueBin[which(df$value>2000)] <- ">2,000" 

I'm getting the following error:

"Error in $<-.data.frame(*tmp*, "valueBin", value = c(NA, NA, NA, : replacement has 6530 rows, data has 6532"

Every element of df$value should fit into one of my which() statements. There are no missing values in df$value. Although even if I run just the first conditional statement (<=250), I get the exact same error, with "...replacement has 6530 rows..." although there are way fewer than 6530 records with value<=250, and value is never NA.

This SO link notes a similar error when using aggregate() was a bug, but it recommends installing the version of R I have. Plus the bug report says its fixed. R aggregate error: "replacement has <foo> rows, data has <bar>"

This SO link seems more related to my issue, and the issue here was an issue with his/her conditional logic that caused fewer elements of the replacement array to be generated. I guess that must be my issue as well, and figured at first I must have a "<=" instead of an "<" or vice versa, but after checking I'm pretty sure they're all correct to cover every value of "value" without overlaps. R error in '[<-.data.frame'... replacement has # items, need #

like image 708
Max Power Avatar asked Apr 23 '15 05:04

Max Power


1 Answers

The answer by @akrun certainly does the trick. For future googlers who want to understand why, here is an explanation...

The new variable needs to be created first.

The variable "valueBin" needs to be already in the df in order for the conditional assignment to work. Essentially, the syntax of the code is correct. Just add one line in front of the code chuck to create this name --

df$newVariableName <- NA 

Then you continue with whatever conditional assignment rules you have, like

df$newVariableName[which(df$oldVariableName<=250)] <- "<=250" 

I blame whoever wrote that package's error message... The debugging was made especially confusing by that error message. It is irrelevant information that you have two arrays in the df with different lengths. No. Simply create the new column first. For more details, consult this post https://www.r-bloggers.com/translating-weird-r-errors/

like image 135
RachelSunny Avatar answered Oct 03 '22 15:10

RachelSunny