Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using RSync to copy a sequential range of files

Sorry if this makes no sense, but I will try to give all the information needed!

I would like to use rsync to copy a range of sequentially numbered files from one folder to another.

I am archiving a DCDM (Its a film thing) and it contains in the order of 600,000 individually numbered, sequential .tif image files (~10mb ea.).

I need to break this up to properly archive onto LTO6 tapes. And I would like to use rsync to prep the folders such that my simple bash .sh file can automate the various folders and files that I want to back up to tape.

The command I normally use when running rsync is:

sudo rsync -rvhW --progress --size only <src> <dest>

I use sudo if needed, and I always test the outcome first with --dry-run

The only way I’ve got anything to work (without kicking out errors) is by using the * wildcard. However, this only does files with the set pattern (eg. 01* will only move files from the range 010000 - 019999) and I would have to repeat for 02, 03, 04 etc..

I've looked on the internet, and am struggling to find an answer that works.

This might not be possible, and with 600,000 .tif files, I can't write an exclude for each one!

Any thoughts as to how (if at all) this could be done?

Owen.

like image 594
Owen Morgan Avatar asked Sep 08 '14 13:09

Owen Morgan


People also ask

How do I rsync a range of files?

Copy multiple files locally If you want to copy multiple files at once from one location to another within your system, you can do so by typing rsync followed by source files name and the destination directory.

Does rsync do incremental copy?

Creating incremental backups with rsyncAn incremental backup stores only the data that has been changed since the previous backup was made. In an incremental backup strategy, only the first backup of the series is a “full backup”; the subsequent ones, will just store the incremental differences.

Which is faster cp or rsync?

rsync is much faster than cp for this, because it will check file sizes and timestamps to see which ones need to be updated, and you can add more refinements. You can even make it do a checksum instead of the default 'quick check', although this will take longer.

Does rsync recursive?

The rsync tool can recursively navigate a directory structure and update a second location with any new/changed/removed files. It checks to see if files exist in the destination before sending them, saving bandwidth and time for everything it skips.


2 Answers

You can check for the file name starting with a digit by using pattern matching:

for file in [0-9]*; do
    # do something to $file name that starts with digit
done

Or, you could enable the extglob option and loop over all file names that contain only digits. This could eliminate any potential unwanted files that start with a digit but contain non-digits after the first character.

shopt -s extglob
for file in +([0-9]); do
    # do something to $file name that contains only digits
done
  • +([0-9]) expands to one or more occurrence of a digit

Update:

Based on the file name pattern in your recent comment:

shopt -s extglob
for file in legendary_dcdm_3d+([0-9]).tif; do
    # do something to $file
done
like image 170
John B Avatar answered Oct 13 '22 02:10

John B


Globing is the feature of the shell to expand a wildcard to a list of matching file names. You have already used it in your question.

For the following explanations, I will assume we are in a directory with the following files:

$ ls -l
-rw-r----- 1 5gon12eder staff 0 Sep  8 17:26 file.txt
-rw-r----- 1 5gon12eder staff 0 Sep  8 17:26 funny_cat.jpg
-rw-r----- 1 5gon12eder staff 0 Sep  8 17:26 report_2013-1.pdf
-rw-r----- 1 5gon12eder staff 0 Sep  8 17:26 report_2013-2.pdf
-rw-r----- 1 5gon12eder staff 0 Sep  8 17:26 report_2013-3.pdf
-rw-r----- 1 5gon12eder staff 0 Sep  8 17:26 report_2013-4.pdf
-rw-r----- 1 5gon12eder staff 0 Sep  8 17:26 report_2014-1.pdf
-rw-r----- 1 5gon12eder staff 0 Sep  8 17:26 report_2014-2.pdf

The most simple case is to match all files. The following makes for a poor man's ls.

$ echo *
file.txt funny_cat.jpg report_2013-1.pdf report_2013-2.pdf report_2013-3.pdf report_2013-4.pdf report_2014-1.pdf report_2014-2.pdf

If we want to match all reports from 2013, we can narrow the match:

$ echo report_2013-*.pdf
report_2013-1.pdf report_2013-2.pdf report_2013-3.pdf report_2013-4.pdf

We could, for example, have left out the .pdf part but I like to be as specific as possible.

You have already come up with a solution to use this for selecting a range of numbered files. For example, we can match reports by quater:

$ for q in 1 2 3 4; do echo "$q. quater: " report_*-$q.pdf; done
1. quater:  report_2013-1.pdf report_2014-1.pdf
2. quater:  report_2013-2.pdf report_2014-2.pdf
3. quater:  report_2013-3.pdf
4. quater:  report_2013-4.pdf

If we are to lazy to type 1 2 3 4, we could have used $(seq 4) instead. This invokes the program seq with argument 4 and substitutes its output (1 2 3 4 in this case).

Now back to your problem: If you want chunk sizes that are a power of 10, you should be able to extend the above example to fit your needs.

like image 22
5gon12eder Avatar answered Oct 13 '22 03:10

5gon12eder