I have a column that should consist of numbers only but there are characters or other symbols in there as well. R sees the feature Housenumber as a character.
For instance:
Housenumber
1
14
5
at5
53.!
boat
I was wondering what kind of function I could write to identify the rows that do not consist of numbers only and to delete those?
Housenumber
1
14
5
df[length(grep("[^[:digit:]]", df$Housenumber, value=F)) == 0, ]
Explanation:
The regex [^[:digit:]] will match any non numeric character, e.g. the other characters and symbols which you want to strip.
The call
grep("[^[:digit:]]", df$Housenumber, value=F)
will return a vector containing the first index of your Housenumber column if a match is found. So if a match isn't found, the length of this vector will be zero, and it means you want to keep that row.
In this particular case, I prefer the answer given by @akrun, but my answer also works in the general case of filtering rows using any sort of regex.
This can be done with as.numeric which will convert the non-numeric elements to NA, and we delete those rows with !is.na that gives a logical index.
df1[!is.na(as.numeric(df1$Housenumber)),, drop= FALSE]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With