Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Import exponential values as numeric in R

Tags:

import

r

I have to import many datasets automatically with the first column being a name, so a character vector, and the second column being a numeric vector, so I was using these specifications with read.table: colClasses = c("character", "numeric").

This works great if I have a dataframe saved in a df_file like this:

df<- data.frame(V1=c("s1","s2","s3","s4"), V2=c("1e-04","1e-04","1e-04","1e-04")

read.table(df_file, header = FALSE,  comment.char="", colClasses = c("character", "numeric"), stringsAsFactors=FALSE)

The problem is in some cases I have dataframes with numeric values in the form of exponential in the second column, and in these cases the import does not work since it does not recognise the column as numeric (or it imports as "character" if I don't specify the colClasses), so my question is: how can I specify a column to be imported as numeric even when the values are exponential?

For example:

df<- data.frame(V1=c("s1","s2","s3","s4"), V2=c("10^(-4)","10^(-4)","10^(-4)","10^(-4)"))

I want all the exponential values to be imported as numeric, but even when I try to change from character to numeric after they are imported I get all "NA" (as.numeric(as.character(df$V2)) "Warning message: NAs introduced by coercion ")

I have tried to use "real" or "complex" with colClasses too but it still imports the exponentials as character.

Please help, thank you!

like image 346
user2337032 Avatar asked Dec 07 '22 07:12

user2337032


2 Answers

I think the problem is that the form your exponentials are written in doesn't match the R style. If you read them in as character vectors you can convert them to exponentials if you know they all are exponentials. Use gsub to strip out the "10^(" and the ")", leaving you with the "-4", convert to numeric, then convert back to an exponential. Might not be the fastest way, but it works.

From your example:

 df<- data.frame(V1=c("s1","s2","s3","s4"), V2=c("10^(-4)","10^(-4)","10^(-4)","10^(-4)"))
 df$V2 <- 10^(as.numeric(gsub("10\\^\\(|\\)", "", df$V2)))
 df
#  V1    V2
#1 s1 1e-04
#2 s2 1e-04
#3 s3 1e-04
#4 s4 1e-04

Whats happening in detail: gsub("10\\^\\(|\\)", "", df$V2) is substituting 10^( and ) with an empty string (you need to escape the carat and the parentheses), as.numeric() is converting your -4 string into the number -4, then you're just running 10^ on each element of the numeric vector you just made.

like image 120
Bill Beesley Avatar answered Jan 01 '23 01:01

Bill Beesley


If you read in your data.frame with stringsAsFactors=FALSE, the column in question should come in as a character vector, in which case you can simply do:

transform(df, V2=eval(parse(text=V2)))
like image 33
Matthew Plourde Avatar answered Jan 01 '23 03:01

Matthew Plourde