Hello everyone I am analysing UCI adult census data. The data has question marks (?) for every missing value.
I want to replace all the question marks with NA.
i tried:
library(XML)
census<-read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",header=F,na.strings="?")
names(census)<-c("Age","Workclass","Fnlwght","Education","EducationNum","MaritalStatus","Occupation"
,"Relationship" , "Race","Gender","CapitalGain","CapitalLoss","HoursPerWeek","NativeCountry","Salary" )
table(census$Workclass)
? Federal-gov Local-gov Never-worked Private Self-emp-inc
1836 960 2093 7 22696 1116
Self-emp-not-inc State-gov Without-pay
2541 1298 14
x
<-ifelse(census$Workclass=="?",NA,census$Workclass)
table(x)
x
1 2 3 4 5 6 7 8 9
1836 960 2093 7 22696 1116 2541 1298 14
but it did not work.
Please help.
Here's an easy way to replace " ?" with NA in all columns.
# find elements
idx <- census == " ?"
# replace elements with NA
is.na(census) <- idx
How it works?
The command idx <- census == " ?" creates a logical matrix with the same numbers of rows and columns as the data frame census. This matrix idx contains TRUE where census contains " ?" and FALSE at the other positions.
The matrix idx is used as an index. The command is.na(census) <- idx is used to replace values in census at the positions in idx with NA.
Note that the function is.na<- is used here. It is not identical with the is.na function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With