I have a problem whereby I want to match the start postcode and end postcode of a very large survey dataset, and put these results in a new dataframe. I have created an example dataframe to use for the purpose of illustration.
ID = c(1,2,3,4,5)
StartPC = c("AF2 4RE","AF3 5RE","AF1 3DR","AF2 4RE","AF2 4PE")
EndPC = c("AF2 4RE","NA","AF2 3DR","AX2 4RE","AF2 4PE")
data<-data.frame(ID,StartPC,EndPC)
data2 <- subset(data, StartPC==EndPC,na.rm=TRUE)
Using the above code, I want to create a dataframe (data2) which only includes the ID numbers whereby the start and end postcodes are the same. However, I get the error message:
Error in Ops.factor(StartPC, EndPC) : level sets of factors are different
The output needs just to have ID numbers 1 and 5 included in the new data table.
That will be because
Error in Ops.factor(StartPC, EndPC) : level sets of factors are different
Your two columns are factors, not characters. Factors are categorical variables, which are stored as integers and a lookup-table of 'levels'. Comparing them is actually comparing the underlying integers, so R makes sure you are comparing factors with the same levels. If not, then it decides you are doing a bad thing.
So convert to character:
> subset(data, as.character(StartPC)==as.character(EndPC),na.rm=TRUE)
ID StartPC EndPC
1 1 AF2 4RE AF2 4RE
5 5 AF2 4PE AF2 4PE
either on the fly like that, or make your data frame with characters in the first place, or make sure both columns are made with the same levels.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With