Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numeric variables converted to factors when reading a CSV file

Tags:

r

csv

read.table

I'm trying to read a .csv file into R where all the column are numeric. However, they get converted to factor everytime I import them.

Here's a sample of how my CSV looks like:

enter image description here

This is my code:

options(StringsAsFactors=F)
data<-read.csv("in.csv", dec = ",", sep = ";")

As you can see, I set dec to , and sep to ;. Still, all the vectors that should be numerics are factors!

Can someone give me some advice? Thanks!

like image 569
intael Avatar asked Nov 19 '13 00:11

intael


People also ask

How do I convert a number to text in a CSV file?

Export data to a text file by saving itGo to File > Save As. Click Browse. In the Save As dialog box, under Save as type box, choose the text file format for the worksheet; for example, click Text (Tab delimited) or CSV (Comma delimited).

How do you convert numeric variables to factors in R?

In R, you can convert multiple numeric variables to factor using lapply function. The lapply function is a part of apply family of functions. They perform multiple iterations (loops) in R. In R, categorical variables need to be set as factor variables.

Which method is used to read data from CSV?

csv file in reading mode using open() function. Then, the csv. reader() is used to read the file, which returns an iterable reader object. The reader object is then iterated using a for loop to print the contents of each row.

How to convert string to factor in CSV?

If you have stringsAsFactors = TRUE in options or in read.csv (default), the column is further converted to factor. You can use the argument na.strings to tell read.csv which strings should be interpreted as NA.

How to convert na to character in CSV file?

Your NA strings in the csv file, N/A, are interpreted as character and then the whole column is converted to character. If you have stringsAsFactors = TRUE in options or in read.csv (default), the column is further converted to factor.

What variables are automatically converted to character strings?

Having a trouble while reading a csv file. All variables (Factors, numeric etc) are automatically converted into Character strings. Can someone help me? Thank you!

How do I set the data type of a CSV file?

Defining the data type of each column when reading a CSV file If you want to set the data type for each column when reading a CSV file, you can use the argument dtype when loading data with read_csv (): The dtype argument takes a dictionary with the key representing the column and the value representing the data type.


2 Answers

Your NA strings in the csv file, N/A, are interpreted as character and then the whole column is converted to character. If you have stringsAsFactors = TRUE in options or in read.csv (default), the column is further converted to factor. You can use the argument na.strings to tell read.csv which strings should be interpreted as NA.

A small example:

df <- read.csv(text = "x;y
                 N/A;2,2
                 3,3;4,4", dec = ",", sep = ";")
str(df)

df <- read.csv(text = "x;y
                 N/A;2,2
                 3,3;4,4", dec = ",", sep = ";", na.strings = "N/A")
str(df)

Update following comment

Although not apparent from the sample data provided, there is also a problem with instances of '$' concatenated to the numbers, e.g. '$3,3'. Such values will be interpreted as character, and then the dec = "," doesn't help us. We need to replace both the '$' and the ',' before the variable is converted to numeric.

df <- read.csv(text = "x;y;z
               N/A;1,1;2,2$
               $3,3;5,5;4,4", dec = ",", sep = ";", na.strings = "N/A")
df
str(df)

df[] <- lapply(df, function(x){
  x2 <- gsub(pattern = "$", replacement = "", x = x, fixed = TRUE)
  x3 <- gsub(pattern = ",", replacement = ".", x = x2, fixed = TRUE)
  as.numeric(x3)
  }
                         )
df
str(df)
like image 182
Henrik Avatar answered Oct 06 '22 23:10

Henrik


You could have gotten your original code to work actually - there's a tiny typo ('stringsAsFactors', not 'StringsAsFactors'). The options command wont complain with the wrong text, but it just wont work. When done correctly, it'll read it as char, instead of factors. You can then convert columns to whatever format you want.

like image 26
aifille Avatar answered Oct 07 '22 01:10

aifille