Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

read.xlsx and colClasses

Tags:

r

r-xlsx

Does anyone know why argument colClasses does not seem to work in read.xlsx?

I create a sample *.xlsx file:

> library(xlsx)
> d1 = data.frame(A=LETTERS[1:3], B=letters[1:3], C=1:3, D=c(1.1, NA, NA))
> str(d1)
'data.frame':   3 obs. of  4 variables:
 $ A: Factor w/ 3 levels "A","B","C": 1 2 3
 $ B: Factor w/ 3 levels "a","b","c": 1 2 3
 $ C: int  1 2 3
 $ D: num  1.1 NA NA
> write.xlsx(d1, 'test.xlsx', sheetName='Sheet1', row.names=F, showNA=F)

then try to read it with read.xlsx, without and with colClasses argument:

> d2 = read.xlsx('test.xlsx', sheetName='Sheet1')
> str(d2)
'data.frame':   3 obs. of  4 variables:
 $ A: Factor w/ 3 levels "A","B","C": 1 2 3
 $ B: Factor w/ 3 levels "a","b","c": 1 2 3
 $ C: num  1 2 3
 $ D: num  1.1 NA NA
> d2 = read.xlsx('test.xlsx', sheetName='Sheet1', colClasses=c(B='character', 'A'='character'))
> str(d2)
'data.frame':   3 obs. of  4 variables:
 $ A: Factor w/ 3 levels "A","B","C": 1 2 3
 $ B: Factor w/ 3 levels "a","b","c": 1 2 3
 $ C: num  1 2 3
 $ D: num  1.1 NA NA

The problem is colClasses seems to have no effect. Any ideas?

Thank you for your help.

Aleksey

P.S. I have R 3.0.1, xlsx 0.5.1

like image 732
user2690051 Avatar asked Aug 16 '13 17:08

user2690051


1 Answers

colClasses= is working but the problem is that on your system default action when import data is to convert character columns to factor.

If you import test.xlsx and set that all columns should be "character", you see that all columns are made as factors (also numbers).

d2 = read.xlsx('test.xlsx', sheetName='Sheet1', colClasses=rep("character",4))
 str(d2)
'data.frame':   3 obs. of  4 variables:
 $ A: Factor w/ 3 levels "A","B","C": 1 2 3
 $ B: Factor w/ 3 levels "a","b","c": 1 2 3
 $ C: Factor w/ 3 levels "1","2","3": 1 2 3
 $ D: Factor w/ 1 level "1.1": 1 NA NA

To ensure that characters are not converted to factors you can add argument stringsAsFactors=FALSE to function read.xlsx().

d2 = read.xlsx('test.xlsx', sheetName='Sheet1', 
                colClasses=c(B='character', A='character'),stringsAsFactors=FALSE)

str(d2)
'data.frame':   3 obs. of  4 variables:
 $ A: chr  "A" "B" "C"
 $ B: chr  "a" "b" "c"
 $ C: num  1 2 3
 $ D: num  1.1 NA NA
like image 186
Didzis Elferts Avatar answered Nov 15 '22 12:11

Didzis Elferts