Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Possible inconsistency in conversion from text to numeric

Compare the conversion of a character string with as.numeric to how it can be done with read.fwf .

as.numeric("457")  # 457
as.numeric("4 57") # NA with warning message

Now read from a file "fwf.txt" containing exactly " 5 7 12 4" .

foo<-read.fwf('fwf.txt',widths=c(5,5),colClasses='numeric',header=FALSE)
  V1  V2
1 57 124

foo<-read.fwf('fwf.txt',widths=c(5,5),colClasses='character',header=FALSE)
     V1    V2
1   5 7  12 4

Now, I'll note that in the "numeric" version, read.fwf does concatenation the same way Fortran does. I was just a bit surprised that it doesn't throw an error or NA in the same manner as as.numeric . Anyone know why?

like image 517
Carl Witthoft Avatar asked Jun 26 '14 18:06

Carl Witthoft


People also ask

Why Excel does not convert text to number?

On the Tools menu, click Options. In the Options dialog box, click the Error Checking tab. In the Settings section, click to select the Enable background error checking check box. In the Rules section, make sure the Number stored as text rule is selected, and then click OK.

How do I fix convert to number in Excel?

Next to the selected cell or range of cells, click the error button that appears. On the menu, click Convert to Number. (If you want to simply get rid of the error indicator without converting the number, click Ignore Error.) This action converts the numbers that are stored as text back to numbers.

How do I convert text to currency in Excel?

Generally, you should use the Format Cells dialog (Ctrl+1) or Home > Number > Accounting Number Format option to apply a currency formatting to a cell.


1 Answers

As @eipi10 pointed out, the space eliminating behavior is not unique to read.fwf. It actually comes form the scan() function (which is used by read.table which is used by read.fwf). Actually the scan() function will remove spaces (or tabs if they are not specified as the delimiter) from any value that is not a character as it process the input stream. Once it has the "cleaned" the value of spaces, then it uses the same function as as.numeric to turn that value into a number. With character values it don't take out any white space unless you set strip.white=TRUE which will only remove space from the beginning and end of the value.

Observe these examples

scan(text="TRU E", what=logical(), sep="x")
# [1] TRUE
scan(text="0 . 0 0 7", what=numeric(), sep="x")
# [1] 0.007
scan(text=" text    ", what=character(), sep="~")
# [1] " text    "
scan(text=" text book   ", what=character(), sep="~", strip.white=T)
# [1] "text book"
scan(text="F\tALS\tE", what=logical(), sep=" ")
# [1] FALSE

You can find the source for scan() in /src/main/scan.c and the specific part responsible for this behavior is around this line.

If you wanted as.numeric to behave like, you could create a new function like

As.Numeric<-function(x) as.numeric(gsub(" ", "", x, fixed=T))

in order to get

As.Numeric("4 57")
# [1] 457
like image 146
MrFlick Avatar answered Sep 29 '22 07:09

MrFlick