Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can rsync support One to Many syncing?

Can I sync changes from a "model" site that I work on, across hundreds of sites on the SAME server using rsync?
I would be updating common template files and JS scripts. If possible how would I set this up?
(I'm on a Hostgator Dedicated server, running Apache)

like image 228
filip Avatar asked Dec 08 '10 16:12

filip


People also ask

Can rsync do two way sync?

rsync works in one direction, so we need to run it twice to sync directories in both directions.

Is rsync multithreaded?

If you are like me, you will have found through trial and error that multiple rsync sessions each taking a specific ranges of files will complete much faster. Rsync is not multithreaded, but for the longest time I sure wished it was.

How do I sync two folders with rsync?

Synchronizing two folders with rsync To keep two folders in synchrony, not only do we need to add the new files in the source folder to the destination folder, as in the past topics, we also need to remove the files that are deleted in the source folder from the destination folder.

Is rsync faster than CP?

rsync is much faster than cp for this, because it will check file sizes and timestamps to see which ones need to be updated, and you can add more refinements. You can even make it do a checksum instead of the default 'quick check', although this will take longer.


1 Answers

Read my extended answer for the edited question below.

The most trivial and naive approach would probably to set up a script that just runs rsync for every server you want to synchronize.

This is fine in most cases, but I don't think this is what you are looking for, because you would have figured that out yourself...

This method also has the following disadvantages:

  • One server sends all the traffic, there is no cascading. So its a single point of failure and a bottleneck

  • It is very inefficient. Rsync is a great tool, but parsing the file list and checking differences is not really quick, if you want to synchronize hundreds of servers

But What can you do?

Configuring rsync for multiple servers is obviously the easiest way to go. So you should start with that and optimize where your problems are.

You can speed it up for example by using the right Filesystem. XFS will probably be like 50 times faster than Ext3.

You can also use unison which is bit more powerful tool and keeps a list of Files in cache.

You can also set up a cascade (Server A synchronizing to Server B synchronizing to Server C).

You could also set up pulling rather than pushing by you clients. You could have a sub domain for that which is point of entry to a load balancer, where you have 1 or more servers behind which you synchronize by pushing from your source server.

The reason why I am telling you all this is because there is not the perfect way to go, you have to figure it out depending on your needs.

However I would definitely recommend looking into GIT.

Git is a version control system that is very powerful and efficient.

You could create a git repository and push to your client machines.

It works very well and efficient and is flexible and scalable, so you can build almost anything on this structure including distributed file systems, cascades, loadbalancing etc.

Hope I gave you some points in the right directions you can look into.

Edit:

So looks like you want to synchronize changes on the same server - or even same hard disc (which I don't know, but is very important for the possibilities that you have).

Well basically its all the same. Insert - Overwrite - Delete ... Rsync is also a very great tool for that because it transfers changes incremental. Not only "resumes broken transfers".

But i would say it completely depends on the content.

If you have a lot of small files, such as you say template, javascript, etc, rsync may be very slow. It might even faster to completely delete the source folder and copy the files there again. So rsync (or any other tool) doesn't have to check all files for changes etc.

You can also just copy everything with the -rf switch so everything will be overwritten, but then you could have old files there that got deleted.

I also know many cases where such stuff is done using subversion, because people feel like having more control or something I dunno. Its also more flexible.

However there is one thing that you should think of:

There is the concept of shared data.

There are symlinks and hard links.

You can put them on files and folders (hard links only on files. I dunno why).

If you put a Symlink A on a target B the file looks like being located and named like the symlink, but the resource behind is somewhere completely different. But applications CAN distinguish. Apache for example has to be configured to follow symlinks (otherwise it would be a security issue).

So if you changes are all in one folder you could just put a symlink called like that folder, pointing to your folder there, and you never have to worry about synchronizing again, because they share the very same resource.

However there are reasons why you wouldn't want to do so:

  • They look different. -that sounds absurd, but really, that is the most common reason why people don't like symlinks. People are complaining because they "look so weird in their program" or whatever.

  • Symlinks are limited in certain capabilities but therefore have other huge advantages. Like cross-filesystem pointing etc. However. Almost every disadvantage can be quite well dealed with and be worked around in your application. The pitiful truth is that symlinks are a fundamental feature of linux oses and filesystems, but their existence is sometimes forgotten when developing an application. Its like developing a train but forgetting that there are also people with long legs or something...

Hardlinks on the other hand do exactly look like files because they are files.

And every hardlink pointing to one file is that very file.

It sounds confusing but think of it as follows:

Every file is some data on the disc. Then there is some inode pointer which inside some directory with some name pointing to that resource.

Hardlinks are just that. There are just multiple "listings" of the file.

As a consequence they share the same read lock, get modified/deleted/etc together.

This however can of course only be done on one filesystem/device and not cross-device wise.

Links have some big advantages. They are quite obvious:

You don't have duplicate data. Which eliminates the potential for inconsistencies and you don't have to update and need less space on the disk.

This however has sometimes far more significance.

For example if you run multiple website and all of them use the Zend Framework.

This is shitload huge framework and the opcode caching of it will fill up like 50 megs of your ram or something.

If you have the same zend library folder for your websites, you need that only once.

like image 52
The Surrican Avatar answered Sep 19 '22 20:09

The Surrican