Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Listing all files matching a full-path pattern in R

Tags:

regex

path

r

I am trying to obtain the list of files matching a full-path pattern. So far, I have used list.files() but it did not work.

Let's assume that we have the following directory organization:

results    |- A    |  |- data-1.csv    |  |- data-2.csv    |    |- B       |- data-1.csv       |- data-2.csv 

Then the following command:

list.files(pattern='data-.*\\.csv', recursive=TRUE) 

will return all the files matching the pattern. This works, but the problem appears when using a full-path pattern. For instance, if I want to obtain all the CSV files from directory results/A, I could do:

list.files(pattern='results/A/data-.*\\.csv', recursive=TRUE) 

This does not work, though. Somehow, it seems like R is not able to use a full-path pattern as a regular expression. In this case, the solution could be to just use results/A as the base path. But in more complex problems, that cannot be done. For instance, at some point we may want to match the subdirectories containing only characters:

list.files(pattern='results/[A-Z]+/data-.*\\.csv', recursive=TRUE) 

Is it possible to do this in R?

UPDATE: After using ad hoc solutions for a while, I decided to stop typing the same again and again. So, I created a library for simplifying this task.

like image 288
betabandido Avatar asked Apr 27 '12 15:04

betabandido


People also ask

How do I list all files in a working directory in R?

To list all files in a directory in R programming language we use list. files(). This function produces a list containing the names of files in the named directory. It returns a character vector containing the names of the files in the specified directories.

What is dir () in R?

Basic R Syntax: The dir R function returns a character vector of file and/or folder names within a directory.

How do I list all directories in R?

The list. dirs() method in R language is used to retrieve a list of directories present within the path specified. The output returned is in the form of a character vector containing the names of the files contained in the specified directory path, or returns null if no directories were returned.

How do I get the path of a file in R?

If we want to check the current directory of the R script, we can use getwd( ) function. For getwd( ), no need to pass any parameters. If we run this function we will get the current working directory or current path of the R script. To change the current working directory we need to use a function called setwd( ).


2 Answers

First, note that you are not using regular expression patterns. Your first example should be:

list.files(pattern='data-.*\\.csv', recursive=TRUE) 

Then, it seems the pattern matching inside list.files is applied to the file basenames (i.e., not including the directory path) so you could split the task into:

  1. Find all files matching the basename only, return their full paths:

    basename.matches <- list.files(pattern='data-.*\\.csv', recursive=TRUE,                                full.names = TRUE) basename.matches # [1] "./results/A/data-1.csv" "./results/A/data-2.csv" "./results/B/data-1.csv" # [4] "./results/B/data-2.csv" 
  2. Keep only those that match the expected directory(ies):

    full.matches <- grep(pattern='^\\./results/A/', basename.matches, value = TRUE) full.matches # [1] "./results/A/data-1.csv" "./results/A/data-2.csv" 
like image 152
flodel Avatar answered Sep 24 '22 03:09

flodel


You cannot do this with only list.files because it loops over each element in path and applies the regular expression to the files contained therein. But since the path argument to list.files can accept a vector, you can use that to solve your problem.

dirs <- grep("[A-Z]+$",list.dirs("results",recursive=FALSE),value=TRUE) list.files(dirs, "data-.*\\.csv", recursive=TRUE, full.names=TRUE) 
like image 31
Joshua Ulrich Avatar answered Sep 21 '22 03:09

Joshua Ulrich