I'm reading a file into R using fread using below methods:
fread("file:///C:/Users/Desktop/ads.csv") fread("C:/Users/Desktop/ads.csv") # Just omitted "file:///"
I've observed the runtime to be very different:
microbenchmark( fread("file:///C:/Users/Desktop/ads.csv"), fread("C:/Users/Desktop/ads.csv") ) Unit: microseconds expr min lq mean median uq max neval cld fread("file:///C:/Users/Desktop/ads.csv") 5755.975 6027.4735 6696.7807 6235.3365 6506.652 41257.476 100 b fread("C:/Users/Desktop/ads.csv") 525.492 584.0215 673.7166 647.4745 727.703 1476.191 100 a
Why does the run-time vary so much? There isn't noticeable difference between 2 variants when I was using read.csv() though
For files beyond 100 MB in size fread() and read_csv() can be expected to be around 5 times faster than read. csv() .
Conclusion: For sequential access, both fread and ifstream are equally fast.
table package is an extremely useful and easy to use. Its fread() function is meant to import data from regular delimited files directly into R, without any detours or nonsense. Note that “regular” in this case means that every row of your data needs to have the same number of columns.
table package comes with a function called fread which is a very efficient and speedy function for reading data from files. It is similar to read. table but faster and more convenient.
The following has been added to ?fread
:
When
input
begins with http://, https://, ftp://, ftps://, or file://,fread
detects this and downloads the target to a temporary file (attempfile()
) before proceeding to read the file as usual. Secure URLS (ftps:// and https://) are downloaded withcurl::curl_download
; ftp:// and http:// paths are downloaded withdownload.file
andmethod
set togetOption("download.file.method")
, defaulting to"auto"
; and file:// is downloaded withdownload.file
withmethod="internal"
. NB: this implies that for file://, even files found on the current machine will be "downloaded" (i.e., hard-copied) to a temporary file. See?download.file
for more details.
From the source of fread
:
if (str6 == "ftp://" || str7 == "http://" || str7 == "file://") { method = if (str7 == "file://") "auto" else getOption("download.file.method", default = "auto") download.file(input, tmpFile, method = method, mode = "wb", quiet = !showProgress) }
That is, your file is being "downloaded" to a temporary file, which should consist of deep-copying the contents of the file to a temporary location. file://
is not really intended for use on local files, but on files in a network that need to be downloaded locally before being read (IIUC; FWIW, this is what fread
's testing regime uses to imitate file download while testing on CRAN, where external file download is impossible).
I also notice that your timings are on the order of microseconds, which could explain the discrepancy vs. read.csv
. Imagine read.csv
takes 1 second to read the file, while fread
takes .01 seconds; file copying takes .05
seconds. Then in both cases read.csv
will look about the same (1 vs 1.05 seconds), while fread
looks substantially slower for the file://
case (.01 vs. .06 seconds).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With