I have to import many datasets automatically with the first column being a name, so a character vector, and the second column being a numeric vector, so I was using these specifications with read.table: colClasses = c("character", "numeric").
This works great if I have a dataframe saved in a df_file like this:
df<- data.frame(V1=c("s1","s2","s3","s4"), V2=c("1e-04","1e-04","1e-04","1e-04")
read.table(df_file, header = FALSE, comment.char="", colClasses = c("character", "numeric"), stringsAsFactors=FALSE)
The problem is in some cases I have dataframes with numeric values in the form of exponential in the second column, and in these cases the import does not work since it does not recognise the column as numeric (or it imports as "character" if I don't specify the colClasses), so my question is: how can I specify a column to be imported as numeric even when the values are exponential?
For example:
df<- data.frame(V1=c("s1","s2","s3","s4"), V2=c("10^(-4)","10^(-4)","10^(-4)","10^(-4)"))
I want all the exponential values to be imported as numeric, but even when I try to change from character to numeric after they are imported I get all "NA" (as.numeric(as.character(df$V2)) "Warning message: NAs introduced by coercion ")
I have tried to use "real" or "complex" with colClasses too but it still imports the exponentials as character.
Please help, thank you!
I think the problem is that the form your exponentials are written in doesn't match the R style. If you read them in as character vectors you can convert them to exponentials if you know they all are exponentials. Use gsub to strip out the "10^(" and the ")", leaving you with the "-4", convert to numeric, then convert back to an exponential. Might not be the fastest way, but it works.
From your example:
df<- data.frame(V1=c("s1","s2","s3","s4"), V2=c("10^(-4)","10^(-4)","10^(-4)","10^(-4)"))
df$V2 <- 10^(as.numeric(gsub("10\\^\\(|\\)", "", df$V2)))
df
# V1 V2
#1 s1 1e-04
#2 s2 1e-04
#3 s3 1e-04
#4 s4 1e-04
Whats happening in detail: gsub("10\\^\\(|\\)", "", df$V2)
is substituting 10^( and ) with an empty string (you need to escape the carat and the parentheses), as.numeric()
is converting your -4 string into the number -4, then you're just running 10^ on each element of the numeric vector you just made.
If you read in your data.frame
with stringsAsFactors=FALSE
, the column in question should come in as a character vector, in which case you can simply do:
transform(df, V2=eval(parse(text=V2)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With