I have to read in a lot of CSV files automatically. Some have a comma as a delimiter, then I use the command read.csv()
.
Some have a semicolon as a delimiter, then I use read.csv2()
.
I want to write a piece of code that recognizes if the CSV file has a comma or a semicolon as a a delimiter (before I read it) so that I don´t have to change the code every time.
My approach would be something like this:
try to read.csv("xyz")
if error
read.csv2("xyz")
Is something like that possible? Has somebody done this before? How can I check if there was an error without actually seeing it?
Here are a few approaches assuming that the only difference among the format of the files is whether the separator is semicolon and the decimal is a comma or the separator is a comma and the decimal is a point.
1) fread As mentioned in the comments fread
in data.table package will automatically detect the separator for common separators and then read the file in using the separator it detected. This can also handle certain other changes in format such as automatically detecting whether the file has a header.
2) grepl Look at the first line and see if it has a comma or semicolon and then re-read the file:
L <- readLines("myfile", n = 1)
if (grepl(";", L)) read.csv2("myfile") else read.csv("myfile")
3) count.fields We can assume semicolon and then count the fields in the first line. If there is one field then it is comma separated and if not then it is semicolon separated.
L <- readLines("myfile", n = 1)
numfields <- count.fields(textConnection(L), sep = ";")
if (numfields == 1) read.csv("myfile") else read.csv2("myfile")
Update Added (3) and made improvements to all three.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With