Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Test for numeric elements in a character string

Tags:

regex

r

People also ask

How do you check if a string contains a number?

To find whether a given string contains a number, convert it to a character array and find whether each character in the array is a digit using the isDigit() method of the Character class.

Is used to check if all characters in a string are numeric?

isdecimal() isdecimal() returns True if all characters are decimal characters in the Unicode general category Nd . CJK fullwidth numbers are also determined to be True . A string containing symbols such as - and . is determined to be False .

How do you check if a string is a number r?

In R, you check if a string is a valid number with the functions as. numeric() and is.na().

Do strings have numerical values?

One of the most widely used data types is a string. A string consists of one or more characters, which can include letters, numbers, and other types of characters.


Maybe there's a reason some other pieces of your data are more complicated that would break this, but my first thought is:

> !is.na(as.numeric(x))
[1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE

As noted below by Josh O'Brien this won't pick up things like 7L, which the R interpreter would parse as the integer 7. If you needed to include those as "plausibly numeric" one route would be to pick them out with a regex first,

x <- c("1.2","1e4","1.2.3","5L")
> x
[1] "1.2"   "1e4"   "1.2.3" "5L"   
> grepl("^[[:digit:]]+L",x)
[1] FALSE FALSE FALSE  TRUE

...and then strip the "L" from just those elements using gsub and indexing.


I recently encountered a similar problem where I was trying to write a function to format values passed as a character string from another function. The formatted values would ultimately end up in a table and I wanted to create logic to identify NA, character strings, and character representations of numbers so that I could apply sprintf() on them before generating the table.

Although more complicated to read, I do like the robustness of the grepl() approach. I think this gets all of the examples brought up in the comments.

x <- c("0",37,"42","-5","-2.3","1.36e4","4L","La","ti","da",NA)

y <- grepl("[-]?[0-9]+[.]?[0-9]*|[-]?[0-9]+[L]?|[-]?[0-9]+[.]?[0-9]*[eE][0-9]+",x)

This would be evaluate to (formatted to help with visualization):

x
[1] "0"  "37"   "42"  "-5"   "-2.3"   "1.36e4" "4L" "La"     "ti"     "da"     NA 

y
[1] TRUE  TRUE   TRUE  TRUE   TRUE     TRUE    TRUE FALSE   FALSE    FALSE    FALSE

The regular expression is TRUE for:

  • positive or negative numbers with no more than one decimal OR
  • positive or negative integers (e.g., 4L) OR
  • positive or negative numbers in scientific notation

Additional terms could be added to handle decimals without a leading digit or numbers with a decimal point but not digits after the decimal if the dataset contained numbers in poor form.


Avoid re-inventing the wheel with check.numeric() from package varhandle.

The function accepts the following arguments:

v The character vector or factor vector. (Mandatory)

na.rm logical. Should the function ignore NA? Default value is FLASE since NA can be converted to numeric. (Optional)

only.integer logical. Only check for integers and do not accept floating point. Default value is FALSE. (Optional)

exceptions A character vector containing the strings that should be considered as valid to be converted to numeric. (Optional)

ignore.whitespace logical. Ignore leading and tailing whitespace characters before assessing if the vector can be converted to numeric. Default value is TRUE. (Optional)


Another possibility:

x <- c("0.33", ".1", "3", "123", "2.3.3", "1.2r", "1.2", "1e4", "1.2.3", "5L", ".22", -3)
locs <- sapply(x, function(n) {

    out <- try(eval(parse(text = n)), silent = TRUE)
    !inherits(out, 'try-error')

}, USE.NAMES = FALSE)

x[locs]
## [1] "0.33" ".1"   "3"    "123"  "1.2"  "1e4"  "5L"   ".22"  "-3"  

x[!locs]
## [1] "2.3.3" "1.2r"  "1.2.3"