I have a column for residential adresses in my dataset 'ad'. I want to check for addresses which has no numbers(including roman numerals) present. I'm using
ad$check <- grepl("[[:digit:]]",ad$address)
to flag out addresses with no digits present. How do I do the same with addresses that contain roman numerals?
Eg: "floor X, DLF Building- III, ABC City"
You need to make a regex string.
Edit (my first answer was nonsense):
x <- c("floor Imaginary, building- Momentum, ABC City", "floor X, DLF Building- III, ABC City")
# here come the regex
grepl("\\b[I|V|X|L|C|D|M]\\b", x, ignore.case = FALSE)
[1] FALSE TRUE
To break it down:
\\b
are word boundaries. It means roman numerals must be preceded or trailed by whitespace, punctuation or beginning/end of the string.
[I|V|X|L|C|D|M]
the "word" we are looking for can only consist of the symbols used for roman numerals. These should be all as far as I know.
ignore.case = FALSE
this is the standard which is normally set if you omit the option. I find it safer, however, to mention it explicitly if it is important for the operation at hand.
Use with caution, as a company called e.g., "LCD Industries" would also be flagged as roman numeral. You could combine my approach with this answer to further test if the symbols are in the right order.
Please test on your data and report if it works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With