Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does one call `git read-tree` after a sparse checkout

According to Subdirectory Checkouts with git sparse-checkout one calls git read-tree -mu HEAD after configuring a sparse checkout in the case of an already existing repository, i.e.:

# Enable sparse-checkout:
git config core.sparsecheckout true

# Configure sparse-checkout 
echo some/dir/ >> .git/info/sparse-checkout
echo another/sub/tree >> .git/info/sparse-checkout

# Update your working tree:
git read-tree -mu HEAD
  • Can you please explain the read-tree step in more detail?
  • How does it work?
  • What is going on?
  • Why does one use read-tree and not, let us say, checkout?
  • Why does one use -mu (why is this a merge, and what is merged)?

-m

    Perform a merge, not just a read. The command will refuse to run if
    your index file has unmerged entries, indicating that you have not
    finished previous merge you started.

-u

    After a successful merge, update the files in the work tree with the
    result of the merge.
like image 986
Micha Wiedenmann Avatar asked Feb 27 '15 08:02

Micha Wiedenmann


People also ask

What does git read tree do?

Usually a three-way merge by git read-tree resolves the merge for really trivial cases and leaves other cases unresolved in the index, so that porcelains can implement different merge policies.

What is sparse checkout in git?

"Sparse checkout" allows populating the working directory sparsely. It uses the skip-worktree bit (see git-update-index[1]) to tell Git whether a file in the working directory is worth looking at. If the skip-worktree bit is set, and the file is not present in the working tree, then its absence is ignored.

When does Git sparse-checkout call Git read-tree -MU head?

According to Subdirectory Checkouts with git sparse-checkout one calls git read-tree -mu HEAD after configuring a sparse checkout in the case of an already existing repository, i.e.:

How do I enable sparse checkout in Git?

To enable the sparse-checkout feature, run git sparse-checkout init to initialize a simple sparse-checkout file and enable the core.sparseCheckout config setting. Then, run git sparse-checkout set to modify the patterns in the sparse-checkout file. To repopulate the working directory with all files, use the git sparse-checkout disable command.

Does Git sparse-checkout support skip-worktree?

Note: The update-index and read-tree primitives for supporting the skip-worktree bit predated the introduction of git sparse-checkout. Users are encouraged to use sparse-checkout in preference to these low-level primitives.

Why does Git read-tree refuse to run?

However, if you have local changes in the working tree that would be overwritten by this merge, git read-tree will refuse to run to prevent your changes from being lost. In other words, there is no need to worry about what exists only in the working tree.


1 Answers

With Git 2.25 (Q1 2020), Management of sparsely checked-out working tree has gained a dedicated "sparse-checkout" command.
It introduces a cone mode (that I detail in "Git sparse checkout with exclusion"), which will make a sparse-checkout must faster.

But it also indirectly describes why git read-tree -mu HEAD is used (or, with the new "cone" mode, was used).

See commit e6152e3 (21 Nov 2019) by Jeff Hostetler (Jeff-Hostetler).
See commit 761e3d2 (20 Dec 2019) by Ed Maste (emaste).
See commit 190a65f (13 Dec 2019), and commit cff4e91, commit 416adc8, commit f75a69f, commit fb10ca5, commit 99dfa6f, commit e091228, commit e9de487, commit 4dcd4de, commit eb42fec, commit af09ce2, commit 96cc8ab, commit 879321e, commit 72918c1, commit 7bffca9, commit f6039a9, commit d89f09c, commit bab3c35, commit 94c0956 (21 Nov 2019) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit bd72a08, 25 Dec 2019)

sparse-checkout: update working directory in-process

Signed-off-by: Derrick Stolee

The sparse-checkout builtin used 'git read-tree -mu HEAD' to update the skip-worktree bits in the index and to update the working directory.
This extra process is overly complex, and prone to failure. It also requires that we write our changes to the sparse-checkout file before trying to update the index.

Remove this extra process call by creating a direct call to unpack_trees() in the same way 'git read-tree -mu HEAD' does.
In addition, provide an in-memory list of patterns so we can avoid reading from the sparse-checkout file. This allows us to test a proposed change to the file before writing to it.

An earlier version of this patch included a bug when the 'set' command failed due to the "Sparse checkout leaves no entry on working directory" error.
It would not rollback the index.lock file, so the replay of the old sparse-checkout specification would fail. A test in t1091 now covers that scenario.


And, with Git 2.27 (Q2 2020), "sparse-checkout" knows how to reapply itself:

See commit 5644ca2, commit 681c637, commit ebb568b, commit 22ab0b3, commit 6271d77, commit 1ac83f4, commit cd002c1, commit 4ee5d50, commit f56f31a, commit 7af7a25, commit 30e89c1, commit 3cc7c50, commit b0a5a12, commit 72064ee, commit fa0bde4, commit d61633a, commit d7dc1e1, commit 031ba55 (27 Mar 2020) by Elijah Newren (newren).
(Merged by Junio C Hamano -- gitster -- in commit 48eee46, 29 Apr 2020)

sparse-checkout: provide a new reapply subcommand

Reviewed-by: Derrick Stolee
Signed-off-by: Elijah Newren

If commands like merge or rebase materialize files as part of their work, or a previous sparse-checkout command failed to update individual files due to dirty changes, users may want a command to simply 'reapply' the sparsity rules.

Provide one.

The updated git sparse-checkout man page now includes:

reapply:

Reapply the sparsity pattern rules to paths in the working tree.

Commands like merge or rebase can materialize paths to do their work (e.g. in order to show you a conflict), and other sparse-checkout commands might fail to sparsify an individual file (e.g. because it has unstaged changes or conflicts).

In such cases, it can make sense to run git sparse-checkout reapply later after cleaning up affected paths (e.g. resolving conflicts, undoing or committing changes, etc.).


But, with Git 2.27, it won't reapply/update itself using git read-tree anymore:

See commit 5644ca2, commit 681c637, commit ebb568b, commit 22ab0b3, commit 6271d77, commit 1ac83f4, commit cd002c1, commit 4ee5d50, commit f56f31a, commit 7af7a25, commit 30e89c1, commit 3cc7c50, commit b0a5a12, commit 72064ee, commit fa0bde4, commit d61633a, commit d7dc1e1, commit 031ba55 (27 Mar 2020) by Elijah Newren (newren).
(Merged by Junio C Hamano -- gitster -- in commit 48eee46, 29 Apr 2020)

unpack-trees: add a new update_sparsity() function

Reviewed-by: Derrick Stolee
Signed-off-by: Elijah Newren

Previously, the only way to update the SKIP_WORKTREE bits for various paths was invoking git read-tree -mu HEAD or calling the same code that this codepath invoked.

This however had a number of problems if the index or working directory were not clean.

First, let's consider the case:

Flipping SKIP_WORKTREE -> !SKIP_WORKTREE (materializing files)

If the working tree was clean this was fine, but if there were files or directories or symlinks or whatever already present at the given path then the operation would abort with an error.

Let's label this case for later discussion:

  • A) There is an untracked path in the way

Now let's consider the opposite case:

Flipping !SKIP_WORKTREE -> SKIP_WORKTREE (removing files)

If the index and working tree was clean this was fine, but if there were any unclean paths we would run into problems.

There are three different cases to consider:

  • B) The path is unmerged
  • C) The path has unstaged changes
  • D) The path has staged changes (differs from HEAD)

If any path fell into case B or C, then the whole operation would be aborted with an error.

With sparse-checkout, the whole operation would be aborted for case D as well, but for its predecessor of using git read-tree -mu HEAD directly, any paths that fell into case D would be removed from the working copy and the index entry for that path would be reset to match HEAD -- which looks and feels like data loss to users (only a few are even aware to ask whether it can be recovered, and even then it requires walking through loose objects trying to match up the right ones).

Refusing to remove files that have unsaved user changes is good, but refusing to work on any other paths is very problematic for users.

If the user is in the middle of a rebase or has made modifications to files that bring in more dependencies, then for their build to work they need to update the sparse paths.

This logic has been preventing them from doing so.

Sometimes in response, the user will stage the files and re-try, to no avail with sparse-checkout or to the horror of losing their changes if they are using its predecessor of git read-tree -mu HEAD.

Add a new update_sparsity() function which will not error out in any of these cases but behaves as follows for the special cases:

  • A) Leave the file in the working copy alone, clear the SKIP_WORKTREE bit, and print a warning (thus leaving the path in a state where status will report the file as modified, which seems logical).
  • B) Do NOT mark this path as SKIP_WORKTREE, and leave it as unmerged.
  • C) Do NOT mark this path as SKIP_WORKTREE and print a warning about the dirty path.
  • D) Mark the path as SKIP_WORKTREE, but do not revert the version stored in the index to match HEAD; leave the contents alone.

I tried a different behavior for A (leave the SKIP_WORKTREE bit set), but found it very surprising and counter-intuitive (e.g. the user sees it is present along with all the other files in that directory, tries to stage it, but git add ignores it since the SKIP_WORKTREE bit is set).

A & C seem like optimal behavior to me.

B may be as well, though I wonder if printing a warning would be an improvement.

Some might be slightly surprised by D at first, but given that it does the right thing with git commit and even git commit -a (git add ignores entries that are marked SKIP_WORKTREE and thus doesn't delete them, and commit -a is similar), it seems logical to me.

And, still with Git 2.27 (Q2 2020):

See commit 6c34239 (14 May 2020) by Elijah Newren (newren).
(Merged by Junio C Hamano -- gitster -- in commit fde4622, 20 May 2020)

unpack-trees: also allow get_progress() to work on a different index

Noticed-by: Jeff Hostetler
Signed-off-by: Elijah Newren

commit b0a5a12a60 ("unpack-trees: allow check_updates() to work on a different index", 2020-03-27, Git v2.27.0-rc0 -- merge listed in batch #5) allowed check_updates() to work on a different index, but it called get_progress() which was hardcoded to work on o->result much like check_updates() had been.

Update it to also accept an index parameter and have check_updates() pass that parameter along so that both are working on the same index.


The code is more robust with Git 2.29 (Q4 2020):

See commit 55fe225, commit 1c89001, commit 9a53219 (17 Aug 2020), and commit f1de981, commit c514c62, commit 9101c8e, commit 8dc3156 (14 Aug 2020) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 0d9a8e3, 27 Aug 2020)

clear_pattern_list(): clear embedded hashmaps

Signed-off-by: Jeff King
Acked-by: Derrick Stolee

Commit 96cc8ab531 ("sparse-checkout: use hashmaps for cone patterns", 2019-11-21, Git v2.25.0-rc0 -- merge) added some auxiliary hashmaps to the pattern_list struct, but they're leaked when clear_pattern_list() is called.


Git 2.36 (Q2 2022), is clearer:

See commit ecc7c88 (25 Feb 2022), and commit d79d299, commit 9023535, commit af6a518, commit 26b5d6b, commit b3df8c9 (14 Jan 2022) by Elijah Newren (newren).
See commit 48609de (13 Jan 2022) by Junio C Hamano (gitster).
(Merged by Junio C Hamano -- gitster -- in commit 82386b4, 09 Mar 2022)

9023535bd3:Update documentation related to sparsity and the skip-worktree bit

Signed-off-by: Elijah Newren

  • Both read-tree and update-index tried to describe how to use the skip-worktree bit, but both predated the sparse-checkout command.
    The sparse-checkout command is a far easier mechanism to use and for users trying to reduce the size of their working tree, we should recommend users to look at it instead.
  • The update-index documentation pointed out that assume-unchanged and skip-worktree sounded similar but had different purposes.
    However, it made no attempt to explain the differences, only to point out that they were different.
    Explain the differences.
  • The update-index documentation focused much more on (internal?) implementation details than on end-user usage.
    Try to explain its purpose better for users of update-index, rather than fellow developers trying to work with the SKIP_WORKTREE bit.
  • Clarify that when core.sparseCheckout=true, we treat a file's presence in the working tree as being an override to the SKIP_WORKTREE bit (i.e.
    in sparse checkouts when the file is present we ignore the SKIP_WORKTREE bit).

git read-tree now includes in its man page:

Note: The update-index and read-tree primitives for supporting the skip-worktree bit predated the introduction of git sparse-checkout. Users are encouraged to use sparse-checkout in preference to these low-level primitives.

git sparse-checkout now includes in its man page:

This command is used to create sparse checkouts, which means that it changes the working tree from having all tracked files present, to only have a subset of them. It can also switch which subset of files are present, or undo and go back to having all tracked files present in the working copy.

The subset of files is chosen by providing a list of directories in cone mode (which is recommended), or by providing a list of patterns in non-cone mode.

When in a sparse-checkout, other Git commands behave a bit differently. For example, switching branches will not update paths outside the sparse-checkout directories/patterns, and git commit -a will not record paths outside the sparse-checkout directories/patterns as deleted.

git sparse-checkout now includes in its man page:

"Sparse checkout" allows populating the working directory sparsely. It uses the skip-worktree bit (see git update-index``) to tell Git whether a file in the working directory is worth looking at.

If the skip-worktree bit is set, and the file is not present in the working tree, then its absence is ignored.

Git will avoid populating the contents of those files, which makes a sparse checkout helpful when working in a repository with many files, but only a few are important to the current user.

git update-index`` now includes in its man page:

skip-worktree bit can be defined in one (long) sentence: Tell git to avoid writing the file to the working directory when reasonably possible, and treat the file as unchanged when it is not present in the working directory.

Note that not all git commands will pay attention to this bit, and some only partially support it.

The update-index flags and the read-tree capabilities relating to the skip-worktree bit predated the introduction of the git sparse-checkout command, which provides a much easier way to configure and handle the skip-worktree bits. If you want to reduce your working tree to only deal with a subset of the files in the repository, we strongly encourage the use of git sparse-checkout in preference to the low-level update-index and read-tree primitives.

The primary purpose of the skip-worktree bit is to enable sparse checkouts, i.e. to have working directories with only a subset of paths present.

When the skip-worktree bit is set, Git commands (such as switch, pull, merge) will avoid writing these files.

However, these commands will sometimes write these files anyway in important cases such as conflicts during a merge or rebase.
Git commands will also avoid treating the lack of such files as an intentional deletion; for example git add -u will not not stage a deletion for these files and git commit -a will not make a commit deleting them either.

git update-index`` now includes in its man page:

The assume-unchanged bit is for leaving the file in the working tree but having Git omit checking it for changes and presuming that the file has not been changed (though if it can determine without stat'ing the file that it has changed, it is free to record the changes).

skip-worktree tells Git to ignore the absence of the file, avoid updating it when possible with commands that normally update much of the working directory (e.g. checkout, switch, pull, etc.), and not have its absence be recorded in commits.

Note that in sparse checkouts (setup by git sparse-checkout or by configuring core.sparseCheckout to true), if a file is marked as skip-worktree in the index but is found in the working tree, Git will clear the skip-worktree bit for that file.

like image 144
VonC Avatar answered Sep 21 '22 08:09

VonC