Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

list.files taking account of file size in R?

Tags:

file

r

I have a large number of files in several folders. I can get a list of these files with;

MY_FILES <- list.files(WORKING_DIRECTORY, pattern = "MY_PATTERN", recursive = TRUE)

Most, but not all, of the files are larger than 50Mb. How can I modify the list.files call, so that MY_FILES only contains those above the 50Mb threshold? Or do I need another step to subset MY_FILES afterwards? (Not sure how to do this because list.files returns a vector of names only, there are no details about the files)

I need to stick to R because this is only one step in a series of data manipulations. Thanks.

like image 924
EcologyTom Avatar asked Sep 16 '16 11:09

EcologyTom


People also ask

How do I get a list of files in a folder by size?

To list all files and sort them by size, use the -S option. By default, it displays output in descending order (biggest to smallest in size). You can output the file sizes in human-readable format by adding the -h option as shown. And to sort in reverse order, add the -r flag as follows.

How do I check the size of a file in R?

You may have to use download. file() and then check the file size locally. Since R 3.2 there's a file. size() wrapper.

How do I get a list of files in R?

To list all files in a directory in R programming language we use list. files(). This function produces a list containing the names of files in the named directory. It returns a character vector containing the names of the files in the specified directories.

What command can you use to display files and directories by file size?

Using the ls Command–l – displays a list of files and directories in long format and shows the sizes in bytes.


1 Answers

Sure, just get file sizes.

x <- list.files(full.names = TRUE)

x[sapply(x, file.size) > 300000]
[1] "./hami.jpg"          "./process_steps.jpg" "./shp_sveta.png"

Here I subset only files which are bigger than 300kB. Notice that atom.jpg and other smaller files are not included in the subset. You should use full.names argument to access files which are not in getwd().

enter image description here

like image 120
Roman Luštrik Avatar answered Sep 21 '22 12:09

Roman Luštrik