Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Import newest csv file in directory

Goal:
- Import the newest file (.csv) from a local directory into R

Goal Details:
- A csv file is uploaded to a folder daily on my Mac. I would like to be able to incorporate a function in my R script that automatically imports the newest file into my workspace for further analysis. The file is uploaded daily around 4:30AM
- I would like this function to be run in the morning (no earlier than 6AM so there's plenty of time for leeway here)

Input Details:
- file type: .csv
- naming convention: example file name: "28 Jul 2014 04:37:47 -0400.csv"
- frequency: daily import @ ~ 04:30

What I've Tried:
- I know this may seem like a weak attempt but I'm really at a loss on how to amend this function below.
- My thought on paper is to 'grab' the id of the newest file, than paste() it in front of the directory name, then viola! (but alas my programming skills are lacking to code this here)
- The code below is what tried to run but it just 'hangs' and doesn't finish. I got this code from this R forum found here

Code:

lastChange = file.info(directory)$mtime 
while(TRUE){ 
  currentM = file.info(directory)$mtime 
  if(currentM != lastChange){ 
    lastChange = currentM 
    read.csv(directory) 
  } 
  # try again in 10 minutes 
  Sys.sleep(600) 
} 

My Environment:
- R 3.1
- Mac OS X 10.9.4 (Mavericks)

Thank you so much in advance for any help! :-)

like image 807
hianalytics Avatar asked Jul 28 '14 16:07

hianalytics


2 Answers

-- readfile.R --

files <- file.info(list.files(directory))
read.csv(rownames(files)[order(files$mtime)][nrow(files)])

I'd put the above script in a cron job that runs every morning at a time when the file for the day will have been written. The below crontab runs it every morning at 8am.

-- in crontab --

0 8 * * *  Rscript readfile.R

Read more about cron here.

like image 170
andrew Avatar answered Oct 09 '22 13:10

andrew


A more efficient solution using dplyr/magrittr

pacman::p_load(magrittr)

path <- list.files(path = directory,
                   pattern = "csv$",
                   full.names = TRUE) %>%
  extract(which.max(file.mtime(.)))
like image 33
mzuba Avatar answered Oct 09 '22 12:10

mzuba