Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter by column values while reading using read.csv in R [duplicate]

Tags:

r

I have a huge dataset in the form of a txt file with values separated by semi colons and has close to 2M rows. I need data only corresponding to particular dates in the first col. Sample input is shown below:

Date;Time;Global_active_power;Global_reactive_power;Voltage;Global_intensity;Sub_metering_1;Sub_metering_2;Sub_metering_3
16/12/2006;17:24:00;4.216;0.418;234.840;18.400;0.000;1.000;17.000
16/12/2006;17:25:00;5.360;0.436;233.630;23.000;0.000;1.000;16.000
16/12/2006;17:26:00;5.374;0.498;233.290;23.000;0.000;2.000;17.000

Please help me to filter data corresponding to two dates say 1/2/2007 and 2/2/2007

like image 591
Sundararaman P Avatar asked Dec 09 '16 19:12

Sundararaman P


1 Answers

Here's a good answer on filtering during data import: https://stackoverflow.com/a/15967406/1152809

Basically, you need to use sqldf to filter during import. Here's something like what you need:

install.packages("sqldf")
library(sqldf)
df <- read.csv.sql("sample.csv", "select *, from file where Date = '01/02/2007' or Date = '2/2/2007 ", sep=";")

However, I haven't tested this because you didn't give us a dput of your data. Take a look at this post for info on how to do a good post on R.

Your dates are strings, so they can use the above. However, if you want to use date-specific functions like BETWEEN, you're going to need to change them to the correct format. Here's a sample:

df <- read.csv.sql("sample.csv", "select *, strftime('%d/%m/%Y', Date) as DateFormated from file where DateFormatted >= 1/2/2007 and DateFormatted <= 2/2/2007 ", sep=";")
like image 89
Travis Heeter Avatar answered Nov 12 '22 02:11

Travis Heeter