Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Changing values when converting column type to numeric

Tags:

r

I have a data file with the format from above.
I loaded it into R, and tried to plot a histogram with the values from the dist column and I have got the error "x must be numeric".Therefore I tried to change the format.

> head(data)

    V1        V2
1 type gene_dist
2    A     64667
3    A     76486
4    A     97416
5    A     30876
6    A     88018

> summary(data)
    V1            V2     
 A   : 67   100    :  1  
 B   :122   100906 :  1  
 type:  1   102349 :  1  
            1033   :  1  
            10544  :  1  
            10745  :  1  
            (Other):184  

I tried to set the format for the column using sapply but the values are changed:

> data[,2]<-sapply(data[,2],as.numeric)

> head(data)
    V1  V2
1 type 190
2    A 146
3    A 166
4    A 189

summary(data)
    V1            V2        
 A   : 67   Min.   :  1.00  
 B   :122   1st Qu.: 48.25  
 type:  1   Median : 95.50  
            Mean   : 95.50  
            3rd Qu.:142.75  
            Max.   :190.00 

Does anyone know why is this happening?

like image 310
agatha Avatar asked Jun 13 '11 09:06

agatha


People also ask

How do you convert a column into a numeric DataFrame?

Convert Column to int (Integer)Use pandas DataFrame. astype() function to convert column to int (integer), you can apply this on a specific column or on an entire DataFrame. To cast the data type to 64-bit signed integer, you can use numpy. int64 , numpy.

How do I change all columns to numeric?

To convert columns of an R data frame from integer to numeric we can use lapply function. For example, if we have a data frame df that contains all integer columns then we can use the code lapply(df,as. numeric) to convert all of the columns data type into numeric data type.

How do I change a column to a factor in numeric in R?

We must first convert the factor vector to a character vector, then to a numeric vector. This ensures that the numeric vector contains the actual numeric values instead of the factor levels.

How do I convert a variable to numeric?

Convert character to numeric. To convert character values to numeric values, use the INPUT function. new_variable = input(original_variable, informat.); The informat tells SAS how to interpret the data in the original character variable.


2 Answers

It looks like your second column is a factor. You need to use as.character before as.numeric. This is because factors are stored internally as integers with a table to give the factor level labels. Just using as.numeric will only give the internal integer codes. There is no need to use sapply since these functions are vectorized.

data[,2] <- as.numeric(as.character(data[,2]))

It is likely that the column is a factor because there are some non-numeric characters in some of the entries. Any such entries will be converted to NA with the appropriate warning, but you may want to investigate this in your raw data.

As a side note, data is a poor (though not invalid) choice for a variable name since there is a base function of the same name.

like image 68
James Avatar answered Oct 18 '22 20:10

James


I had the same issue, but as I found, the root cause was different, and so I share this as an answer but not a comment.

df <- read.table(doc.csv, header = TRUE, sep = ",", dec = ".")
df$value

# Results in
[1]  2254    1873    2201    2147    2456    1785

# So..
as.numeric(df$value)
[1] 26 14 22 20 32 11

In my case, the reason was that there were spaces with the values in the original csv document. Removing the spaces fixed the issue.

From the dput(df)

" 1178  ", " 1222  ", " 1223  ", " 1314  ", " 1462  ", 
like image 28
Þórgnýr Thoroddsen Avatar answered Oct 18 '22 21:10

Þórgnýr Thoroddsen