Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

read.table reads numbers as factors

Tags:

r

I have the following sample file:

"id";"PCA0";"PCA1";"PCA2"
1;6.142741644872954;1.2075898020608253;1.8946959360032403   
2;-0.5329026419681557;-8.586870627925729;4.510113575138726

When I try to read it with:

d <- read.table("file.csv", sep=";", header=T)

id is a integer column, PCA0 a numeric an all subsequent columns are factors

class(d$iid)
[1] "integer"
class(d$PCA0)
[1] "numeric"
class(d$PCA1)
[1] "factor"
class(d$PCA2)
[1] "factor"

Why aren't the other columns numeric as well?

I know how to convert the columns, but I want my script to work without manually casting the types. Why doesn't R recognize the numeric columns?

like image 755
Jonas Avatar asked May 26 '14 16:05

Jonas


People also ask

How do you read data from a table?

A table can be read from left to right or from top to bottom. If you read a table across the row, you read the information from left to right. In the Cats and Dogs Table, the number of black animals is 2 + 2 = 4. You'll see that those are the numbers in the row directly to the right of the word 'Black.

What does read table return?

table() function in R Language is used to read data from a text file. It returns the data in the form of a table.

What is the difference between read table and read CSV in R?

Remember that the read. csv() as well as the read. csv2() function are almost identical to the read. table() function, with the sole difference that they have the header and fill arguments set as TRUE by default.

How do I read data from a table in R?

You can use the read. table function to read in a file that contains tabular data into R. By default, the read. table function assumes there is no header row in the file and that the values are separated by white space.


1 Answers

as @MrFlick says: too many digits.

you can force what you want by specifying colClasses argument:

read.table("test.csv",
                sep=";",
                header=TRUE,
                colClasses=c("integer","numeric","numeric","numeric"))

if you really need as much precision as possible:

require(data.table)
d <- fread("test.csv")

Then modify to maximum precision stored:

d[,PCA0 := sprintf("%.15E",PCA0)]
d[,PCA1 := sprintf("%.15E",PCA1)]
d[,PCA2 := sprintf("%.15E",PCA2)]

gives:

> d
   id                   PCA0                   PCA1                  PCA2
1:  1  6.142741644872954E+00  1.207589802060825E+00    1.8946959360032403   
2:  2 -5.329026419681557E-01 -8.586870627925729E+00     4.510113575138726

note: fread should be smater + faster.

like image 169
npjc Avatar answered Sep 26 '22 13:09

npjc