Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read csv with dates and numbers

Tags:

r

read.csv

I have a problem when I import a csv file with R:

example lines to import:

2010-07-27;91
2010-07-26;93
2010-07-23;88

I use the statement:

data <- read.csv2(file="...", sep=";", dec=".", header=FALSE)

when I try to aggregate this data with other ones originated by statistical analysis using cbind, the date is showed as an integer number because it was imported as factor.

If I try to show it as a string using as.character, the numerical data are transformed into characters too so they are unusable for statistical procedures.

like image 956
keanu Avatar asked Aug 24 '10 08:08

keanu


People also ask

How do I keep a numbers format in a CSV file?

To preserve all the digits in text-formatted numbers, you have to import the downloaded CSV file as raw data into a new Excel spreadsheet, set the column datatypes as needed, and then save the new file as an Excel workbook. Excel (XLSX) files will preserve these formats, CSV files won't.


2 Answers

Use colClasses argument:

data <- read.csv2(file="...", sep=";", dec=".", header=FALSE,
     colClasses=c("Date",NA))

NA means "proceed as default"

After import you could convert factor to Date by

data[[1]] <- as.Date(data[[1]])
like image 167
Marek Avatar answered Oct 11 '22 05:10

Marek


Perhaps you want to convert the character values to meaningful time values. In that case POSIXt time objects are a good choice.

Given your data file I'd do something like.

data <- read.table(file="...", sep = ";", as.is = TRUE)
data[,1] <- strptime(data[,1], "%Y-%m-%d")

Look up strptime in help for more details.

NOTE: If you're going to specify all the properties of the file just use read.table. The only purpose for all of the other read.xxx versions is to simplify the expression because the defaults are set. Here you used read.csv2 because it defaults to sep = ';'. Therefore, don't specify it again. Not having to specify that is the entire reason the command exists. Personally, I only use read.table because I can never remember the names/defaults of all the variants. In your case it's also the briefest call because it satisfies your header and dec defaults.

like image 33
John Avatar answered Oct 11 '22 07:10

John