Hash :
53e0636c
Author :
Date :
2013-09-24T16:43:06
Cleaned up some tables in checkout-internals doc The markdown wasn't rendering correctly.
Checkout has to handle a lot of different cases. It examines the differences between the target tree, the baseline tree and the working directory, plus the contents of the index, and groups files into five categories:
Right now, this classification is done via 3 iterators (for the three trees), with a final lookup in the index. At some point, this may move to a 4 iterator version to incorporate the index better.
The actual checkout is done in five phases (at least right now).
Checkout could be driven either off a target-to-workdir diff or a baseline-to-target diff. There are pros and cons of each.
Target-to-workdir means the diff includes every file that could be modified, which simplifies bookkeeping, but the code to constantly refer back to the baseline gets complicated.
Baseline-to-target has simpler code because the diff defines the action to take, but needs special handling for untracked and ignored files, if they need to be removed.
The current checkout implementation is based on a baseline-to-target diff.
The most interesting aspect of this is phase 2, picking the actions that should be taken. There are a lot of corner cases, so it may be easier to start by looking at the rules for a simple 2-iterator diff:
Old | New | ||
---|---|---|---|
0 | x | x | nothing |
1 | x | B1 | added blob |
2 | x | T1 | added tree |
3 | B1 | x | removed blob |
4 | B1 | B1 | unmodified blob |
5 | B1 | B2 | modified blob |
6 | B1 | T1 | typechange blob -> tree |
7 | T1 | x | removed tree |
8 | T1 | B1 | typechange tree -> blob |
9 | T1 | T1 | unmodified tree |
10 | T1 | T2 | modified tree (implies modified/added/removed blob inside) |
Now, let’s make the “New” iterator into a working directory iterator, so we replace “added” items with either untracked or ignored, like this:
Old | New | ||
---|---|---|---|
0 | x | x | nothing |
1 | x | B1 | untracked blob |
2 | x | Bi | ignored file |
3 | x | T1 | untracked tree |
4 | x | Ti | ignored tree |
5 | B1 | x | removed blob |
6 | B1 | B1 | unmodified blob |
7 | B1 | B2 | modified blob |
8 | B1 | T1 | typechange blob -> tree |
9 | B1 | Ti | removed blob AND ignored tree as separate items |
10 | T1 | x | removed tree |
11 | T1 | B1 | typechange tree -> blob |
12 | T1 | Bi | removed tree AND ignored blob as separate items |
13 | T1 | T1 | unmodified tree |
14 | T1 | T2 | modified tree (implies modified/added/removed blob inside) |
Note: if there is a corresponding entry in the old tree, then a working directory item won’t be ignored (i.e. no Bi or Ti for tracked items).
Now, expand this to three iterators: a baseline tree, a target tree, and an actual working directory tree:
(base == old HEAD; target == what to checkout; actual == working dir)
base | target | actual/workdir | ||
---|---|---|---|---|
0 | x | x | x | nothing |
1 | x | x | B1/Bi/T1/Ti | untracked/ignored blob/tree (SAFE) |
2+ | x | B1 | x | add blob (SAFE) |
3 | x | B1 | B1 | independently added blob (FORCEABLE-2) |
4* | x | B1 | B2/Bi/T1/Ti | add blob with content conflict (FORCEABLE-2) |
5+ | x | T1 | x | add tree (SAFE) |
6* | x | T1 | B1/Bi | add tree with blob conflict (FORCEABLE-2) |
7 | x | T1 | T1/i | independently added tree (SAFE+MISSING) |
8 | B1 | x | x | independently deleted blob (SAFE+MISSING) |
9- | B1 | x | B1 | delete blob (SAFE) |
10- | B1 | x | B2 | delete of modified blob (FORCEABLE-1) |
11 | B1 | x | T1/Ti | independently deleted blob AND untrack/ign tree (SAFE+MISSING !!!) |
| 12 | B1 | B1 | x | locally deleted blob (DIRTY || SAFE+CREATE) | | 13+ | B1 | B2 | x | update to deleted blob (SAFE+MISSING) | | 14 | B1 | B1 | B1 | unmodified file (SAFE) | | 15 | B1 | B1 | B2 | locally modified file (DIRTY) | | 16+ | B1 | B2 | B1 | update unmodified blob (SAFE) | | 17 | B1 | B2 | B2 | independently updated blob (FORCEABLE-1) | | 18+ | B1 | B2 | B3 | update to modified blob (FORCEABLE-1) | | 19 | B1 | B1 | T1/Ti | locally deleted blob AND untrack/ign tree (DIRTY) | | 20 | B1 | B2 | T1/Ti | update to deleted blob AND untrack/ign tree (F-1) | | 21+ | B1 | T1 | x | add tree with locally deleted blob (SAFE+MISSING) | | 22 | B1 | T1 | B1 | add tree AND deleted blob (SAFE) | | 23 | B1 | T1 | B2 | add tree with delete of modified blob (F-1) | | 24 | B1 | T1 | T1 | add tree with deleted blob (F-1) | | 25 | T1 | x | x | independently deleted tree (SAFE+MISSING) | | 26 | T1 | x | B1/Bi | independently deleted tree AND untrack/ign blob (F-1) | | 27- | T1 | x | T1 | deleted tree (MAYBE SAFE) | | 28+ | T1 | B1 | x | deleted tree AND added blob (SAFE+MISSING) | | 29 | T1 | B1 | B1 | independently typechanged tree -> blob (F-1) | | 30+ | T1 | B1 | B2 | typechange tree->blob with conflicting blob (F-1) | | 31 | T1 | B1 | T1/T2 | typechange tree->blob (MAYBE SAFE) | | 32+ | T1 | T1 | x | restore locally deleted tree (SAFE+MISSING) | | 33 | T1 | T1 | B1/Bi | locally typechange tree->untrack/ign blob (DIRTY) | | 34 | T1 | T1 | T1/T2 | unmodified tree (MAYBE SAFE) | | 35+ | T1 | T2 | x | update locally deleted tree (SAFE+MISSING) | | 36* | T1 | T2 | B1/Bi | update to tree with typechanged tree->blob conflict (F-1) | | 37 | T1 | T2 | T1/T2/T3 | update to existing tree (MAYBE SAFE) |
The number is followed by ‘ ‘ if no change is needed or ‘+’ if the case needs to write to disk or ‘-‘ if something must be deleted and ‘*’ if there should be a delete followed by an write.
There are four tiers of safe cases:
content, which is unknown at this point
Some slightly unusual circumstances:
Cases 3, 17, 24, 26, and 29 are all considered conflicts even though none of them will require making any updates to the working directory.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204
Checkout Internals
==================
Checkout has to handle a lot of different cases. It examines the
differences between the target tree, the baseline tree and the working
directory, plus the contents of the index, and groups files into five
categories:
1. UNMODIFIED - Files that match in all places.
2. SAFE - Files where the working directory and the baseline content
match that can be safely updated to the target.
3. DIRTY/MISSING - Files where the working directory differs from the
baseline but there is no conflicting change with the target. One
example is a file that doesn't exist in the working directory - no
data would be lost as a result of writing this file. Which action
will be taken with these files depends on the options you use.
4. CONFLICTS - Files where changes in the working directory conflict
with changes to be applied by the target. If conflicts are found,
they prevent any other modifications from being made (although there
are options to override that and force the update, of course).
5. UNTRACKED/IGNORED - Files in the working directory that are untracked
or ignored (i.e. only in the working directory, not the other places).
Right now, this classification is done via 3 iterators (for the three
trees), with a final lookup in the index. At some point, this may move to
a 4 iterator version to incorporate the index better.
The actual checkout is done in five phases (at least right now).
1. The diff between the baseline and the target tree is used as a base
list of possible updates to be applied.
2. Iterate through the diff and the working directory, building a list of
actions to be taken (and sending notifications about conflicts and
dirty files).
3. Remove any files / directories as needed (because alphabetical
iteration means that an untracked directory will end up sorted *after*
a blob that should be checked out with the same name).
4. Update all blobs.
5. Update all submodules (after 4 in case a new .gitmodules blob was
checked out)
Checkout could be driven either off a target-to-workdir diff or a
baseline-to-target diff. There are pros and cons of each.
Target-to-workdir means the diff includes every file that could be
modified, which simplifies bookkeeping, but the code to constantly refer
back to the baseline gets complicated.
Baseline-to-target has simpler code because the diff defines the action to
take, but needs special handling for untracked and ignored files, if they
need to be removed.
The current checkout implementation is based on a baseline-to-target diff.
Picking Actions
===============
The most interesting aspect of this is phase 2, picking the actions that
should be taken. There are a lot of corner cases, so it may be easier to
start by looking at the rules for a simple 2-iterator diff:
Key
---
- B1,B2,B3 - blobs with different SHAs,
- Bi - ignored blob (WD only)
- T1,T2,T3 - trees with different SHAs,
- Ti - ignored tree (WD only)
- x - nothing
Diff with 2 non-workdir iterators
---------------------------------
| | Old | New | |
|----|-----|-----|------------------------------------------------------------|
| 0 | x | x | nothing |
| 1 | x | B1 | added blob |
| 2 | x | T1 | added tree |
| 3 | B1 | x | removed blob |
| 4 | B1 | B1 | unmodified blob |
| 5 | B1 | B2 | modified blob |
| 6 | B1 | T1 | typechange blob -> tree |
| 7 | T1 | x | removed tree |
| 8 | T1 | B1 | typechange tree -> blob |
| 9 | T1 | T1 | unmodified tree |
| 10 | T1 | T2 | modified tree (implies modified/added/removed blob inside) |
Now, let's make the "New" iterator into a working directory iterator, so
we replace "added" items with either untracked or ignored, like this:
Diff with non-work & workdir iterators
--------------------------------------
| | Old | New | |
|----|-----|-----|------------------------------------------------------------|
| 0 | x | x | nothing |
| 1 | x | B1 | untracked blob |
| 2 | x | Bi | ignored file |
| 3 | x | T1 | untracked tree |
| 4 | x | Ti | ignored tree |
| 5 | B1 | x | removed blob |
| 6 | B1 | B1 | unmodified blob |
| 7 | B1 | B2 | modified blob |
| 8 | B1 | T1 | typechange blob -> tree |
| 9 | B1 | Ti | removed blob AND ignored tree as separate items |
| 10 | T1 | x | removed tree |
| 11 | T1 | B1 | typechange tree -> blob |
| 12 | T1 | Bi | removed tree AND ignored blob as separate items |
| 13 | T1 | T1 | unmodified tree |
| 14 | T1 | T2 | modified tree (implies modified/added/removed blob inside) |
Note: if there is a corresponding entry in the old tree, then a working
directory item won't be ignored (i.e. no Bi or Ti for tracked items).
Now, expand this to three iterators: a baseline tree, a target tree, and
an actual working directory tree:
Checkout From 3 Iterators (2 not workdir, 1 workdir)
----------------------------------------------------
(base == old HEAD; target == what to checkout; actual == working dir)
| |base | target | actual/workdir | |
|-----|-----|------- |----------------|--------------------------------------------------------------------|
| 0 | x | x | x | nothing |
| 1 | x | x | B1/Bi/T1/Ti | untracked/ignored blob/tree (SAFE) |
| 2+ | x | B1 | x | add blob (SAFE) |
| 3 | x | B1 | B1 | independently added blob (FORCEABLE-2) |
| 4* | x | B1 | B2/Bi/T1/Ti | add blob with content conflict (FORCEABLE-2) |
| 5+ | x | T1 | x | add tree (SAFE) |
| 6* | x | T1 | B1/Bi | add tree with blob conflict (FORCEABLE-2) |
| 7 | x | T1 | T1/i | independently added tree (SAFE+MISSING) |
| 8 | B1 | x | x | independently deleted blob (SAFE+MISSING) |
| 9- | B1 | x | B1 | delete blob (SAFE) |
| 10- | B1 | x | B2 | delete of modified blob (FORCEABLE-1) |
| 11 | B1 | x | T1/Ti | independently deleted blob AND untrack/ign tree (SAFE+MISSING !!!) |
| 12 | B1 | B1 | x | locally deleted blob (DIRTY || SAFE+CREATE) |
| 13+ | B1 | B2 | x | update to deleted blob (SAFE+MISSING) |
| 14 | B1 | B1 | B1 | unmodified file (SAFE) |
| 15 | B1 | B1 | B2 | locally modified file (DIRTY) |
| 16+ | B1 | B2 | B1 | update unmodified blob (SAFE) |
| 17 | B1 | B2 | B2 | independently updated blob (FORCEABLE-1) |
| 18+ | B1 | B2 | B3 | update to modified blob (FORCEABLE-1) |
| 19 | B1 | B1 | T1/Ti | locally deleted blob AND untrack/ign tree (DIRTY) |
| 20* | B1 | B2 | T1/Ti | update to deleted blob AND untrack/ign tree (F-1) |
| 21+ | B1 | T1 | x | add tree with locally deleted blob (SAFE+MISSING) |
| 22* | B1 | T1 | B1 | add tree AND deleted blob (SAFE) |
| 23* | B1 | T1 | B2 | add tree with delete of modified blob (F-1) |
| 24 | B1 | T1 | T1 | add tree with deleted blob (F-1) |
| 25 | T1 | x | x | independently deleted tree (SAFE+MISSING) |
| 26 | T1 | x | B1/Bi | independently deleted tree AND untrack/ign blob (F-1) |
| 27- | T1 | x | T1 | deleted tree (MAYBE SAFE) |
| 28+ | T1 | B1 | x | deleted tree AND added blob (SAFE+MISSING) |
| 29 | T1 | B1 | B1 | independently typechanged tree -> blob (F-1) |
| 30+ | T1 | B1 | B2 | typechange tree->blob with conflicting blob (F-1) |
| 31* | T1 | B1 | T1/T2 | typechange tree->blob (MAYBE SAFE) |
| 32+ | T1 | T1 | x | restore locally deleted tree (SAFE+MISSING) |
| 33 | T1 | T1 | B1/Bi | locally typechange tree->untrack/ign blob (DIRTY) |
| 34 | T1 | T1 | T1/T2 | unmodified tree (MAYBE SAFE) |
| 35+ | T1 | T2 | x | update locally deleted tree (SAFE+MISSING) |
| 36* | T1 | T2 | B1/Bi | update to tree with typechanged tree->blob conflict (F-1) |
| 37 | T1 | T2 | T1/T2/T3 | update to existing tree (MAYBE SAFE) |
The number is followed by ' ' if no change is needed or '+' if the case
needs to write to disk or '-' if something must be deleted and '*' if
there should be a delete followed by an write.
There are four tiers of safe cases:
* SAFE == completely safe to update
* SAFE+MISSING == safe except the workdir is missing the expect content
* MAYBE SAFE == safe if workdir tree matches (or is missing) baseline
content, which is unknown at this point
* FORCEABLE == conflict unless FORCE is given
* DIRTY == no conflict but change is not applied unless FORCE
Some slightly unusual circumstances:
* 8 - parent dir is only deleted when file is, so parent will be left if
empty even though it would be deleted if the file were present
* 11 - core git does not consider this a conflict but attempts to delete T1
and gives "unable to unlink file" error yet does not skip the rest
of the operation
* 12 - without FORCE file is left deleted (i.e. not restored) so new wd is
dirty (and warning message "D file" is printed), with FORCE, file is
restored.
* 24 - This should be considered MAYBE SAFE since effectively it is 7 and 8
combined, but core git considers this a conflict unless forced.
* 26 - This combines two cases (1 & 25) (and also implied 8 for tree content)
which are ok on their own, but core git treat this as a conflict.
If not forced, this is a conflict. If forced, this actually doesn't
have to write anything and leaves the new blob as an untracked file.
* 32 - This is the only case where the baseline and target values match
and yet we will still write to the working directory. In all other
cases, if baseline == target, we don't touch the workdir (it is
either already right or is "dirty"). However, since this case also
implies that a ?/B1/x case will exist as well, it can be skipped.
Cases 3, 17, 24, 26, and 29 are all considered conflicts even though
none of them will require making any updates to the working directory.