I didn't find an answer to this question, so hopefully this is the place to get some help on this.
I am reading in many Excel files contained in .zip files. Each .zip that I have has about 40 excel files that I want to read. I am trying to create a list of data frames, but encounter an error on reading some files based on file content.
This is the read statment, inside a for loop:
library(readxl)
df[[i]] <- read_excel(xls_lst[i],
skip = 4,
col_names = FALSE,
na = "n/a",
col_types = data_types)
data_types has these values :
> data_types
[1] "text" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
which is correct for this file.
The read_excel statement works well on some files, but returns warning message on others :
In read_xlsx_(path, sheet, col_names = col_names, col_types = col_types,... :
[54, 7]: expecting numeric: got '9999.990000'
Well, the value '9999.99000' looks like a numeric to me. When I open the Excel file that creates this warning, the file indeed shows these values, and also shows that the column is formatted as text in Excel. When I change the column formatting to numeric, re-save the Excel sheet, then the data is read in correctly.
However, I have several hundreds of these files to read ... how can read_excel ignore the column format indicated by Excel, and instead use the col_type defintion that I supply in the calling statement ?
Thanks,
I tried to build a toy example.
My xlsx file contains:
3 1
3 3
4 4
5 5
7 '999
6 3
Reading in it your way:
data_types<-c("numeric","numeric")
a<-read_excel("aa.xlsx",
col_names = FALSE,
na = "n/a",
col_types = data_types
)
Warning message:
In read_xlsx_(path, sheet, col_names = col_names, col_types = col_types, :
[5, 2]: expecting numeric: got '999'
Reading in everything as text
data_types<-c("text","text")
dat<-read_excel("aa.xlsx",
col_names = FALSE,
na = "n/a",
col_types = data_types
)
And using type.convert:
dat[]<-lapply(dat, type.convert)
works at least for this simple example.
*Edited:
There was a mistake in the code.
*Edit in response to comment:
Another toy example demonstrating how you could apply type.convert to your data:
#list of data frames
l<-list()
l[[1]]<-data.frame(matrix(rep(as.character(1:5),2), ncol = 2), stringsAsFactors = FALSE)
l<-rep(l,3)
#looping over your list to encode columns correctly:
for (i in 1: length(l)){
l[[i]][]<-lapply(l[[i]], type.convert)
}
There might be better solutions. But I think this should work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With