Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read remote file beginning with "smb://" using R

To read a file in R, I'd normally do something like the following:

read.csv('/Users/myusername/myfilename.csv')

But, I'm trying to read a file located on a remote server (Windows SMB/CIFS share) which I access on my Mac via the FinderGoConnect to Server menu item.

When I view that file's properties, the file path is different than what I'm used to. Instead of beginning with: /Users/myusername/..., it is smb://server.msu.edu/.../myfilename.csv.

Trying to read the file, I tried the following:

read.csv('smb://server.msu.edu/.../myfilename.csv')

But, this didn't work.

Instead of the usual "No such file or directory" error, this returned:

smb://server.msu.edu/.../myfilename.csv does not exist in current working directory

I imagine the file path needs a different format, but I can't figure what.

How can you read this type of file in R?

like image 880
Joshua Rosenberg Avatar asked Feb 06 '17 18:02

Joshua Rosenberg


4 Answers

Explanation

smb://educ-srvmedia1.campusad.msu.edu/... is actually a URL not a file path.

Let's break this down

smb:// means use the server message block protocol (file sharing)

educ-srvmedia1.campusad.msu.edu is the name of the server

/.../myfilename.csv is the file share/path on the remote server

You are able to navigate to this directory using Finder on OSX because it has built in support for the SMB protocol. Finder connects to the remote service using the URL and allows you to browse the files.

However R has no understanding of the SMB protocol so can't interpret the file path properly.

The R function read.csv() uses file() internally, see https://stat.ethz.ch/R-manual/R-devel/library/base/html/connections.html

url and file support URL schemes file://, http://, https:// and ftp://

So R returns "unable to locate the file" message because the file cannot be found because the protocol is unsupported. yes, slightly confusing.

Fix

You need to mount the file share on your local filesystem.

All this means is that the details of the SMB protocol will be handled behind the scenes by the OS and the fileshare will be presented as a local directory.

This will allow R (and other programs) to treat the remote files for all intents and purposes, like any other local files. This discussion shows some options for doing so.

e.g.

# need to create /LocalFolder first
mount -t cifs //username:password@hostname/sharename /LocalFolder

then in R:

read.csv('/LocalFolder/myfilename.csv')

Extra

Windows users can accomplish this easier with UNC paths
How to read files from a UNC-specified directory in R?

like image 136
stacksonstacks Avatar answered Oct 16 '22 14:10

stacksonstacks


TL;DR

Here's a portable approach that uses cURL and doesn't require mounting remote filesystems:

> install.packages("curl")
> require("curl")
> handle <- new_handle()
> handle_setopt(handle, username = "domain\\username")
> handle_setopt(handle, password = "secret") # If needed
> request <- curl_fetch_memory("smb://host.example.com/share/file.txt", handle = handle)
> contents <- rawToChar(request$content)

If we need to read the contents as CSV, like in the question, we can stream the file through another function:

> stream <- curl("smb://host.example.com/share/file.txt", handle = handle)
> contents <- read.csv(stream)

Let's take a look at a more robust way to access remote files through smb:// URLs besides the approach described in other answers that mounts the remote filesystem. Unfortunately, I'm a bit late to this one, but I hope this helps future readers.

In some cases, we may not have the privileges needed to mount a filesystem (this requires admin or root access on many systems), or we simply may not want to mount an entire filesystem just to read a single file. We'll use the cURL library to read the file instead. This approach improves the flexibility and portability of our programs because we don't need to depend on the existence of an externally mounted filesystem. We'll examine two different ways: through a system() call, and by using a package that provides a cURL API.

Some background: for those not familiar with it, cURL provides tools used to transfer data over various protocols. Since version 7.40, cURL supports the SMB/CIFS protocol typically used for Windows file-sharing services. cURL includes a command-line tool that we can use to fetch the contents of a file:

$ curl -u 'domain\username' 'smb://host.example.com/share/file.txt'

The command above reads and outputs (to STDOUT) the contents of file.txt from the remote server host.example.com authenticating as the specified user on the domain. The command will prompt us for a password if needed. We can remove the domain portion from the username if our network doesn't use a domain.

System Call

We can achieve the same functionality in R by using the system() function:

system("curl -u 'domain\\username' 'smb://host.example.com/share/file.txt'")

Note the double backslash in domain\\username. This escapes the backslash character so that R doesn't interpret it as an escape character in the string. We can capture file contents from the command output into a variable by setting the intern parameter of the system() function to TRUE:

contents <- system("curl -u 'domain\\username' 'smb://host.example.com/share/file.txt'", intern = TRUE)

...or by calling system2() instead, which quotes the command arguments for safety and better handles process redirection between platforms:

contents <- system2('curl', c("-u", "domain\\\\username", "smb://host.example.com/share/file.txt"), stdout = TRUE)

The curl command will still prompt us for a password if required by the remote server. While we can specify a password using -u 'domain\\username:password' to avoid the prompt, doing so exposes the plain-text password in the command string. For a more secure approach, read the section below that describes the usage of a package.

We can also add the -s or --silent flag to the curl command to suppress the progress status output. Note that doing so will also hide error messages, so we may also want to add -S (--show-error) as well. The contents variable will contain a vector of the lines of the file—similar to the value returned by readLines("file.txt")—that we can squash back together using paste(contents, collapse = "\n").

cURL API

While this all works fine, we can improve upon this approach by using a dedicated cURL library. This curl package provides R bindings to libcurl so that we can use the cURL API in our program directly. First we need to install the package:

install.packages("curl")
require("curl")

(Linux users will need to install libcurl development files.)

Then, we can read the remote file into a variable using the curl_fetch_memory() function:

handle <- new_handle()
handle_setopt(handle, username = "domain\\username")
handle_setopt(handle, password = "secret") # If needed
request <- curl_fetch_memory("smb://host.example.com/share/file.txt", handle = handle)
content <- rawToChar(request$content)

First we create a handle to configure the request by setting any authentication options needed. Then, we execute the request and assign the contents of the file to a variable. As shown, set the password CURLOPT if needed.

To process a remote file like we would with read.csv(), we need to create a streaming connection. The curl() function creates a connection object that we can use to stream the file contents through any function that supports an argument returned by the standard url() function. For example, here's a way to read the remote file as CSV, like in the question:

handle = new_handle()
...
stream <- curl("smb://host.example.com/share/file.txt", handle = handle)
contents <- read.csv(stream)

Of course, the concepts described above apply to fetching the contents or response body over any protocol supported by cURL, not just SMB/CIFS. If needed, we can also use these tools to download files to the filesystem instead of just reading the contents into memory.

like image 32
Cy Rossignol Avatar answered Oct 16 '22 13:10

Cy Rossignol


Below I've shown a way that I've used from time to time to read data from an SMB network drive. In the code below, I've used the R system function to do everything from within R, but you can also mount the drive from the OSX command line or from within Finder with Command-K (connect to server):

If you don't already have one, create a directory on your local drive where the share will be located (this isn't necessary, as you can mount the drive in an existing location):

system("mkdir /Users/eipi10/temp_share/")

or

dir.create("/Users/eipi10/temp_share/")

Mount the network drive to the folder you just created. In the code below, //[email protected]/home/u/eipi10 is your user name and the address of the SMB share.

system("mount_smbfs //[email protected]/home/u/eipi10 /Users/eipi10/temp_share")

If there's password authentication, then the password can be included as well:

system("mount_smbfs //username:[email protected]/home/u/eipi10 /Users/eipi10/temp_share")

Read the data:

dat = read.csv("/Users/eipi10/temp_share/fileToRead.csv")

From within R, you can also programmatically select files to read:

data.list = lapply(list.files(pattern="csv$", "/Users/eipi10/temp_share/", full.names=TRUE), read.csv)
like image 7
eipi10 Avatar answered Oct 16 '22 14:10

eipi10


SMB is the Windows network folder protocol.

Similar cases include sftp:// URLs, for example.

You can either:

  1. mount the folder in your operating system, and access it using a regular path,
  2. use a virtual file system library, such as GVFS/GIO on Linux. Maybe there exists some R wrapper around this that you can use.
like image 1
Has QUIT--Anony-Mousse Avatar answered Oct 16 '22 13:10

Has QUIT--Anony-Mousse