I'm watching files for changes using inotify events (as it happens, from Python, calling into libc).
For some files during a git clone
, I see something odd: I see an IN_CREATE
event, and I see via ls
that the file has content, however, I never see IN_MODIFY
or IN_CLOSE_WRITE
. This is causing me issues since I would like to respond to IN_CLOSE_WRITE
on the files: specifically, to initiate an upload of the file contents.
The files that behave oddly are in the .git/objects/pack
directory, and they end in .pack
or .idx
. Other files that git creates have a more regular IN_CREATE
-> IN_MODIFY
-> IN_CLOSE_WRITE
chain (I'm not watching for IN_OPEN
events).
This is inside docker on MacOS, but I have seen evidence of the same on docker on Linux in a remote system, so my suspicion is the MacOS aspect is not relevant. I am seeing this if watching and git clone
are in the same docker container.
My questions:
Why are these events missing on these files?
What can be done about it? Specifically, how can I respond to the completion of writes to these files? Note: ideally I would like to respond when writing is "finished" to avoid needlessly/(incorrectly) uploading "unfinished" writing.
Edit: Reading https://developer.ibm.com/tutorials/l-inotify/ it looks like what I'm seeing is consistent with
tmp_pack_hBV4Alz
, being created, modified and, closed;.pack
name;tmp_pack_hBV4Alz
name is deleted.I think my problem, which is trying to use inotify as a trigger to upload files, then reduces to noticing that the .pack
file is a hard link to another file, and uploading in this case?
To answer your question separately for git
2.24.1 on Linux 4.19.95:
- Why are these events missing on these files?
You don't see IN_MODIFY
/IN_CLOSE_WRITE
events because git clone
will always try to use hard links for files under the .git/objects
directory. When cloning over the network or across file system boundaries, these events will appear again.
- What can be done about it? Specifically, how can I respond to the completion of writes to these files? Note: ideally I would like to respond when writing is "finished" to avoid needlessly/(incorrectly) uploading "unfinished" writing.
In order to catch modification of hard links you have to set up a handler for the inotify CREATE
event which follows and keeps track of those links. Please note that a simple CREATE
can also mean that a nonempty file was created. Then, on IN_MODIFY
/IN_CLOSE_WRITE
to any of the files you have to trigger the same action on all linked files as well. Obviously you also have to remove that relationship on the DELETE
event.
A simpler and more robust approach would probably be to just periodically hash all the files and check if the content of a file has changed.
After checking the git
source code closely and running git
with strace
, I found that git
does use memory mapped files, but mostly for reading content. See the usage of xmmap
which is always called with PROT_READ
only.. Therefore my previous answer below is NOT the correct answer. Nevertheless for informational purpose I would still like to keep it here:
You don't see IN_MODIFY
events because packfile.c
uses mmap
for file access and inotify
does not report modifications for mmap
ed files.
From the inotify manpage:
The inotify API does not report file accesses and modifications that may occur because of mmap(2), msync(2), and munmap(2).
There is another possibility (from man inotify):
Note that the event queue can overflow. In this case, events are lost. Robust applications should handle the possibility of lost events gracefully. For example, it may be necessary to rebuild part or all of the application cache. (One simple, but possibly expensive, approach is to close the inotify file descriptor, empty the cache, create a new inotify file descriptor, and then re-create watches and cache entries for the objects to be monitored.)
And while git clone
can generate heavy event flow, this can happen.
How to avoid this:
I may speculate that Git most of the time uses atomic file updates which are done like this:
mktemp
-style) name.rename(2)
d -d over the original one; this operation guarantees that every observer trying to open the file using its name will get either the old contents or the new one.Such updates are seen by inotify(7)
as moved_to
events—since a file "reappears" in a directory.
Based on this accepted answer I'd assume there might be some difference in the events based on the protocol being used (i.e. ssh or https).
Do you observe the same behavior when monitoring cloning from the local filesystem with the --no-hardlinks
option?
$ git clone [email protected]:user/repo.git
# set up watcher for new dir
$ git clone --no-hardlinks repo new-repo
Your observed behavior on running the experiment on both a linux and Mac host probably eliminates this open issue being the cause https://github.com/docker/for-mac/issues/896 but adding just incase.
Maybe you made the same mistake I made years ago. I've only used inotify twice. The first time, my code simply worked. Later, I no longer had that source and started again, but this time, I was missing events and did not know why.
It turns out that when I was reading an event, I was really reading a small batch of events. I parsed the one I expected, thinking that was it, that was all. Eventually, I discovered there is more to that received data, and when I added a little code to parse all events received from a single read, no more events were lost.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With