Is it possible to specify a time range so that rsync only operates on recently changed files.
I'm writing a script to backup recently added files over SSH and rsync seems like an efficient solution. My problem is that my source directories contain a huge backlog of older files which I have no interest in backing up.
The only solution I've come across so far is doing a find with ctime to generate a --files-from file. This works, but I have to deal with some old installations with versions of rsync that don't support --files-from. I'm considering generating --include-from patterns in the same way but would love to find something more elegant.
rsync first scans the files and builds a list. so once the file is listed for sync, rsync will sync the latest change of file. but if the file is not in the list of files to be synced, which was built before starting the sync operation, then it will not sync it.
Rsync with --ignore-existing-files: We can also skip the already existing files on the destination. This can generally be used when we are performing backups using the –link-dest option, while continuing a backup run that got interrupted. So any files that do not exist on the destination will be copied over.
Also, rsync provides the ability to synchronize a directory structure (or even a single file) with another destination, local or remote.
Rsync OptionsAllows to sync data recursively but does not keep ownership for users and groups, permissions, timestamps, or symbolic links. The archive mode behaves like the recursive mode but keeps all file permissions, symbolic links, file ownership, etc. Used to compress data during transfers to save space.
It looks like you can specify shell commands in the arguments to rsync (see Remote rsync executes arbitrary shell commands)
so I have been able to successfully limit the files that rsync looks at by using:
rsync -av remote_host:'$(find logs -type f -ctime -1)' local_dir
This looks for any files changed in the last day (-ctime -1) and then rsyncs those into local_dir.
I'm not sure if this feature is by design but I'm still digging into the documentation.
Why not just take the heat on backing up the whole directory once and take advantage of the incremental backing up provided by rsync and rdiff and its cousins, you won't waste diskspace where they are backed up to because they'll be perpetually unchanged.
Backing up the whole thing is simpler, and has substantially less risk for errors. Trying to selectively backup some files and not others is a recipe for not backing up what you need without realizing it, then getting burned when you can't restore a critical file.
Otherwise you should reorganize your source directory so there is less 'decision making' in your backup script.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With