Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Rsync to copy only specific subdirectories (same names in several directories)

I have such directories structure on server 1:

  • data
    • company1
      • unique_folder1
      • other_folder
      • ...
    • company2
      • unique_folder1
      • ...
    • ...

And I want duplicate this folder structure on server 2, but copy only directories/subdirectories of unique_folder1. I.e. as result must be:

  • data
    • company1
      • unique_folder1
    • company2
      • unique_folder1
    • ...

I know that rsync is very good for this. I've tried 'include/exclude' options without success.

E.g. I've tried:

rsync -avzn --list-only --include '*/unique_folder1/**' --exclude '*' -e ssh [email protected]:/path/to/old/data/ /path/to/new/data/

But, as result, I don't see any files/directories:

receiving file list ... done
sent 43 bytes  received 21 bytes  42.67 bytes/sec
total size is 0  speedup is 0.00 (DRY RUN)

What's wrong? Ideas?


Additional information: I have sudo access to both servers. One idea I have - is to use find command and cpio together to copy to new directory with content I need and after that use Rsync. But this is very slow, there are a lot of files, etc.

like image 830
Andron Avatar asked Mar 28 '13 16:03

Andron


People also ask

How do I rsync only certain extensions?

Exclude a Specific File Type The rsync tool allows you to exclude certain file types when synchronizing data. Use an asterisk * followed by the extension of the file type you want to exclude. For example, you may want to back up a directory that contains many . iso files that you do not need to back up.

Does rsync skip identical files?

Please note the following behavior of rsync: Files that do not exist on the remote-host are copied. Files that have been updated will be synced, rsync will copy only the changed parts of files to the remote host. File that is exactly the same are not copied to the remote host at all.

How does rsync include and exclude work?

Using the include Option As its name implies, this option will filter the files transferred and include files based on the value provided. However, the include option only works along with the exclude option. This is because the default operation for rsync is to include everything in the source directory.

What is the difference between rsync and copy?

You can use SecureShell (SSH) or Remote Sync (Rsync) to transfer files to a remote server. Secure Copy (SCP) uses SSH to copy only the files or directories that you select. On first use, Rsync copies all files and directories and then it copies only the files and directories that you have changed.


3 Answers

I've found the reason. As for me - it wasn't clear that Rsync works in this way.
So correct command (for company1 directory only) must be:

rsync -avzn --list-only --include 'company1/' --include 'company1/unique_folder1/***' --exclude '*' -e ssh [email protected]:/path/to/old/data/ /path/to/new/data

I.e. we need include each parent company directory. And of course we cannot write manually all these company directories in the command line, so we save the list into the file and use it.


Final things we need to do:

1.Generate include file on server 1, so its content will be (I've used ls and awk):

+ company1/  
+ company1/unique_folder1/***  
...  
+ companyN/  
+ companyN/unique_folder1/***  

2.Copy include.txt to server 2 and use such command:

rsync -avzn                                        \
      --list-only                                  \
      --include-from '/path/to/new/include.txt'    \
      --exclude '*'                                \
      -e ssh [email protected]:/path/to/old/data/    \
      /path/to/new/data
like image 134
Andron Avatar answered Nov 01 '22 19:11

Andron


If the first matching pattern excludes a directory, then all its descendants will never be traversed. When you want to include a deep directory e.g. company*/unique_folder1/** but exclude everything else *, you need to tell rsync to include all its ancestors too:

rsync -r -v --dry-run                       \
    --include='/'                           \
    --include='/company*/'                  \
    --include='/company*/unique_folder1/'   \
    --include='/company*/unique_folder1/**' \
    --exclude='*'

You can use bash’s brace expansion to save some typing. After brace expansion, the following command is exactly the same as the previous one:

rsync -r -v --dry-run --include=/{,'company*/'{,unique_folder1/{,'**'}}} --exclude='*'
like image 40
yonran Avatar answered Nov 01 '22 19:11

yonran


An alternative to Andron's Answer which is simpler to both understand and implement in many cases is to use the --files-from=FILE option. For the current problem,

rsync -arv --files-from='list.txt' old_path/data new_path/data

Where list.txt is simply

company1/unique_folder1/
company2/unique_folder1/
...

Note the -r flag must be included explicitly since --files-from turns off this behaviour of the -a flag. It also seems to me that the path construction is different from other rsync commands, in that company1/unique_folder1/ matches but /data/company1/unique_folder1/ does not.

like image 7
pip Avatar answered Nov 01 '22 19:11

pip