I have the vector vec that I want to convert to numeric type. Therefore I need to get rid of non-digits (including '+') first. The problem: When I exclude them also the '+' and '-' from 'E+' and 'E-' symbols are also removed.
How can I remove all non-digits except for 'E-', 'E+' and '.' from vec?
vec = c('1234', '+ 42', '1E+4', 'NR 12', '4.5E+04', '8.6E-02')
My approaches:
gsub('[^0-9E.]', '', vec) # removes '-' and '+' in 'E-' and 'E+'
gsub('[^0-9(E\\+).]', '', vec) # includes the '+' from '+ 42'
My desired output is:
c('1234', '42', '1E+4', '12', '4.5E+04', '8.6E-02')
You may extract the numbers using the following regex:
[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?
Details
[-+]? - either + or -[0-9]* - 0+ digits\.? - an optional .[0-9]+ - 1+ digits([eE][-+]?[0-9]+)? - an optional capturing group (add ?: after ( to use a non-capturing group) matching 1 or 0 occurrences of
[eE] - e or E[-+]? - an optional - or +[0-9]+ - 1 or more digitsR demo:
vec <- c('1234', '+ 42', '1E+4', 'NR 12', '4.5E+04', '8.6E-02')
res <- regmatches(vec, regexpr("[-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?", vec))
unlist(res)
## => [1] "1234" "42" "1E+4" "12" "4.5E+04" "8.6E-02"
If multiple matches per item in a character vector are expected replace regexpr with gregexpr.
You can change your regex to check if + or - is not preceded by E or e and in that case don't remove them (using look behind and enable perl=TRUE) and otherwise, include + and - in your main character set, so they are removed in any other case with empty string. Try changing your line from this,
gsub('[^0-9E.]', '', vec)
to,
gsub('(?<![Ee])[+-]|[^0-9E.+-]', '', vec, perl=TRUE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With