Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Rsync filter to include/exclude files

Tags:

linux

bash

rsync

I'm trying to backup a filesystem, exclude /mnt but include a particular path within /mnt, it looks like using --filter is recommended over --include and --exclude, however I don't seem to be able to get it to do my bidding , example:

rsync -aA -H --numeric-ids -v --progress --delete \
  --filter="merge /tmp/mergefilter.txt" /  /mnt/data/mybackup/

My /tmp/mergefilter.txt says:

+ /mnt/data/i-want-to-rsyncthisdirectory/
- /dev
- /sys/
- /tmp/
- /run/
- /mnt/
- /proc/
- /media/
- /var/swap
- /lost+found/

All of the paths starting with "-" gets ignored, however my include for /mnt/data/i-want-to-rsyncthisdirectory/ seems to never get rsync'd. Order and/or including/excluding the trailing slash does not appear to change the behavior related to the path I want included.

EDIT: Note that I do want to backup /etc /usr /var etc. as per the source specified as /

Appreciate any guidance as the man page is a bit of a minefield...

like image 320
user5611823 Avatar asked Feb 12 '16 13:02

user5611823


People also ask

How do I ignore files in rsync?

The rsync tool allows you to exclude certain file types when synchronizing data. Use an asterisk * followed by the extension of the file type you want to exclude. For example, you may want to back up a directory that contains many . iso files that you do not need to back up.

How does rsync include and exclude work?

Using the include Option As its name implies, this option will filter the files transferred and include files based on the value provided. However, the include option only works along with the exclude option. This is because the default operation for rsync is to include everything in the source directory.

Does rsync ignore existing files?

rsync can effectively use the “–ignore-existing” operation to resolve this issue. Usage of “–ignore-existing” will make sure that the files already been handled do not get change. It means that the “–ignore-existing” will only look at the already existing files present in the destination hierarchy.


2 Answers

This question is quite old but I think this might help you:

(from rsync 3.1.2 manual)

Note that, when using the --recursive (-r) option (which is implied by -a), every subcomponent of every path is visited from the top down, so include/exclude patterns get applied recursively to each subcomponent's full name (e.g. to include "/foo/bar/baz" the subcomponents "/foo" and "/foo/bar" must not be excluded). The exclude patterns actually short-circuit the directory traver- sal stage when rsync finds the files to send. If a pattern excludes a particular parent directory, it can render a deeper include pattern ineffectual because rsync did not descend through that excluded section of the hierarchy. This is particularly important when using a trailing '*' rule. For instance, this won't work:

         + /some/path/this-file-will-not-be-found
         + /file-is-included
         - *

This fails because the parent directory "some" is excluded by the '*' rule, so rsync never visits any of the files in the "some" or "some/path" directories. One solution is to ask for all directories in the hierarchy to be included by using a single rule: "+ */" (put it somewhere before the "- *" rule), and perhaps use the --prune-empty-dirs option. Another solution is to add spe- cific include rules for all the parent dirs that need to be visited. For instance, this set of rules works fine:

         + /some/
         + /some/path/
         + /some/path/this-file-is-found
         + /file-also-included
         - *

I proposed something in my original answer that actually does not work (I tested it). I reproduce a tree similar to yours and this solution should work now:

+ /mnt/
+ /mnt/data/
+ /mnt/data/i-want-to-rsyncthisdirectory/
- /mnt/data/*
- /mnt/*
- /dev
- /sys/
- /tmp/
- /run/
- /proc/
- /media/
- /var/swap
- /lost+found/

Explanations:

(only rewording the manual in the end but as you said the manual is a bit cryptic)

Rules are read from top to bottom each time a file must be transferred by rsync. But in your case /mnt/data/i-want-to-rsyncthisdirectory/ is not backed up because you exclude /mnt and this short-circuits your include rules. So the solution is to include each folder and subfolder until the folder you want to back up and then to exclude what you do not want to back up subfolder by subfolder.

Note the * at the end of each subfolder exclusion. It will prevent rsync to back up the files and folder located in these subfolders which is what you want I think.

Simpler solution: (edit 2)

You can even simplify this with the *** pattern that was added in version 2.6.7:

+ /mnt/
+ /mnt/data/
+ /mnt/data/i-want-to-rsyncthisdirectory/***
- /mnt/**

This operator allows you to use the ** wildcard for exclusion and consequently to have only one exclude line.

I also discovered that you can understand which filter rules exclude/include each file or folder thanks to the following rsync arguments:

--verbose --verbose

Combined with the --dry-run argument you should be able to debug you problem :)

like image 129
Tom Avatar answered Oct 03 '22 01:10

Tom


For me, this command is doing the job:

rsync -aA -H --numeric-ids -v --progress --delete \
--filter="+ /mnt/data/i-want-to-rsyncthisdirectory/" \
--filter="- *" . /mnt/data/mybackup/

Basically, I used a + filter for the directory in question and exlcude all the others (as you do in your given example).

There is no need to explicitly negate all the directories you do not want to sync. Instead, you can ignore all except the one in question.

like image 45
Marcus Avatar answered Oct 02 '22 23:10

Marcus