Commit 77cffa31db07187c2fa65457ace1b6cb2547dc5b

Russell Belfer 2013-01-02T17:14:00

Simplify checkout documentation This moves a lot of the detailed checkout documentation into a new file (docs/checkout-internals.md) and simplifies the public docs for the checkout API.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
diff --git a/docs/checkout-internals.md b/docs/checkout-internals.md
new file mode 100644
index 0000000..cb646da
--- /dev/null
+++ b/docs/checkout-internals.md
@@ -0,0 +1,203 @@
+Checkout Internals
+==================
+
+Checkout has to handle a lot of different cases.  It examines the
+differences between the target tree, the baseline tree and the working
+directory, plus the contents of the index, and groups files into five
+categories:
+
+1. UNMODIFIED - Files that match in all places.
+2. SAFE - Files where the working directory and the baseline content
+   match that can be safely updated to the target.
+3. DIRTY/MISSING - Files where the working directory differs from the
+   baseline but there is no conflicting change with the target.  One
+   example is a file that doesn't exist in the working directory - no
+   data would be lost as a result of writing this file.  Which action
+   will be taken with these files depends on the options you use.
+4. CONFLICTS - Files where changes in the working directory conflict
+   with changes to be applied by the target.  If conflicts are found,
+   they prevent any other modifications from being made (although there
+   are options to override that and force the update, of course).
+5. UNTRACKED/IGNORED - Files in the working directory that are untracked
+   or ignored (i.e. only in the working directory, not the other places).
+
+Right now, this classification is done via 3 iterators (for the three
+trees), with a final lookup in the index.  At some point, this may move to
+a 4 iterator version to incorporate the index better.
+
+The actual checkout is done in five phases (at least right now).
+
+1. The diff between the baseline and the target tree is used as a base
+   list of possible updates to be applied.
+2. Iterate through the diff and the working directory, building a list of
+   actions to be taken (and sending notifications about conflicts and
+   dirty files).
+3. Remove any files / directories as needed (because alphabetical
+   iteration means that an untracked directory will end up sorted *after*
+   a blob that should be checked out with the same name).
+4. Update all blobs.
+5. Update all submodules (after 4 in case a new .gitmodules blob was
+   checked out)
+
+Checkout could be driven either off a target-to-workdir diff or a
+baseline-to-target diff.  There are pros and cons of each.
+
+Target-to-workdir means the diff includes every file that could be
+modified, which simplifies bookkeeping, but the code to constantly refer
+back to the baseline gets complicated.
+
+Baseline-to-target has simpler code because the diff defines the action to
+take, but needs special handling for untracked and ignored files, if they
+need to be removed.
+
+The current checkout implementation is based on a baseline-to-target diff.
+
+
+Picking Actions
+===============
+
+The most interesting aspect of this is phase 2, picking the actions that
+should be taken.  There are a lot of corner cases, so it may be easier to
+start by looking at the rules for a simple 2-iterator diff:
+
+Key
+---
+- B1,B2,B3 - blobs with different SHAs,
+- Bi       - ignored blob (WD only)
+- T1,T2,T3 - trees with different SHAs,
+- Ti       - ignored tree (WD only)
+- x        - nothing
+
+Diff with 2 non-workdir iterators
+---------------------------------
+
+    Old New
+    --- ---
+  0   x   x - nothing
+  1   x  B1 - added blob
+  2   x  T1 - added tree
+  3  B1   x - removed blob
+  4  B1  B1 - unmodified blob
+  5  B1  B2 - modified blob
+  6  B1  T1 - typechange blob -> tree
+  7  T1   x - removed tree
+  8  T1  B1 - typechange tree -> blob
+  9  T1  T1 - unmodified tree
+ 10  T1  T2 - modified tree (implies modified/added/removed blob inside)
+
+
+Now, let's make the "New" iterator into a working directory iterator, so
+we replace "added" items with either untracked or ignored, like this:
+
+Diff with non-work & workdir iterators
+--------------------------------------
+
+    Old New-WD
+    --- ------
+  0   x   x - nothing
+  1   x  B1 - untracked blob
+  2   x  Bi - ignored file
+  3   x  T1 - untracked tree
+  4   x  Ti - ignored tree
+  5  B1   x - removed blob
+  6  B1  B1 - unmodified blob
+  7  B1  B2 - modified blob
+  8  B1  T1 - typechange blob -> tree
+  9  B1  Ti - removed blob AND ignored tree as separate items
+ 10  T1   x - removed tree
+ 11  T1  B1 - typechange tree -> blob
+ 12  T1  Bi - removed tree AND ignored blob as separate items
+ 13  T1  T1 - unmodified tree
+ 14  T1  T2 - modified tree (implies modified/added/removed blob inside)
+
+Note: if there is a corresponding entry in the old tree, then a working
+directory item won't be ignored (i.e. no Bi or Ti for tracked items).
+
+
+Now, expand this to three iterators: a baseline tree, a target tree, and
+an actual working directory tree:
+
+Checkout From 3 Iterators (2 not workdir, 1 workdir)
+----------------------------------------------------
+
+(base == old HEAD; target == what to checkout; actual == working dir)
+
+      base target actual/workdir
+      ---- ------ ------
+  0      x      x      x - nothing
+  1      x      x B1/Bi/T1/Ti - untracked/ignored blob/tree (SAFE)
+  2+     x     B1      x - add blob (SAFE)
+  3      x     B1     B1 - independently added blob (FORCEABLE-2)
+  4*     x     B1 B2/Bi/T1/Ti - add blob with content conflict (FORCEABLE-2)
+  5+     x     T1      x - add tree (SAFE)
+  6*     x     T1  B1/Bi - add tree with blob conflict (FORCEABLE-2)
+  7      x     T1   T1/i - independently added tree (SAFE+MISSING)
+  8     B1      x      x - independently deleted blob (SAFE+MISSING)
+  9-    B1      x     B1 - delete blob (SAFE)
+ 10-    B1      x     B2 - delete of modified blob (FORCEABLE-1)
+ 11     B1      x  T1/Ti - independently deleted blob AND untrack/ign tree (SAFE+MISSING !!!)
+ 12     B1     B1      x - locally deleted blob (DIRTY || SAFE+CREATE)
+ 13+    B1     B2      x - update to deleted blob (SAFE+MISSING)
+ 14     B1     B1     B1 - unmodified file (SAFE)
+ 15     B1     B1     B2 - locally modified file (DIRTY)
+ 16+    B1     B2     B1 - update unmodified blob (SAFE)
+ 17     B1     B2     B2 - independently updated blob (FORCEABLE-1)
+ 18+    B1     B2     B3 - update to modified blob (FORCEABLE-1)
+ 19     B1     B1  T1/Ti - locally deleted blob AND untrack/ign tree (DIRTY)
+ 20*    B1     B2  T1/Ti - update to deleted blob AND untrack/ign tree (F-1)
+ 21+    B1     T1      x - add tree with locally deleted blob (SAFE+MISSING)
+ 22*    B1     T1     B1 - add tree AND deleted blob (SAFE)
+ 23*    B1     T1     B2 - add tree with delete of modified blob (F-1)
+ 24     B1     T1     T1 - add tree with deleted blob (F-1)
+ 25     T1      x      x - independently deleted tree (SAFE+MISSING)
+ 26     T1      x  B1/Bi - independently deleted tree AND untrack/ign blob (F-1)
+ 27-    T1      x     T1 - deleted tree (MAYBE SAFE)
+ 28+    T1     B1      x - deleted tree AND added blob (SAFE+MISSING)
+ 29     T1     B1     B1 - independently typechanged tree -> blob (F-1)
+ 30+    T1     B1     B2 - typechange tree->blob with conflicting blob (F-1)
+ 31*    T1     B1  T1/T2 - typechange tree->blob (MAYBE SAFE)
+ 32+    T1     T1      x - restore locally deleted tree (SAFE+MISSING)
+ 33     T1     T1  B1/Bi - locally typechange tree->untrack/ign blob (DIRTY)
+ 34     T1     T1  T1/T2 - unmodified tree (MAYBE SAFE)
+ 35+    T1     T2      x - update locally deleted tree (SAFE+MISSING)
+ 36*    T1     T2  B1/Bi - update to tree with typechanged tree->blob conflict (F-1)
+ 37     T1     T2 T1/T2/T3 - update to existing tree (MAYBE SAFE)
+
+The number is followed by ' ' if no change is needed or '+' if the case
+needs to write to disk or '-' if something must be deleted and '*' if
+there should be a delete followed by an write.
+
+There are four tiers of safe cases:
+
+- SAFE         == completely safe to update
+- SAFE+MISSING == safe except the workdir is missing the expect content
+- MAYBE SAFE   == safe if workdir tree matches (or is missing) baseline
+                  content, which is unknown at this point
+- FORCEABLE == conflict unless FORCE is given
+- DIRTY     == no conflict but change is not applied unless FORCE
+
+Some slightly unusual circumstances:
+
+  8 - parent dir is only deleted when file is, so parent will be left if
+      empty even though it would be deleted if the file were present
+ 11 - core git does not consider this a conflict but attempts to delete T1
+      and gives "unable to unlink file" error yet does not skip the rest
+      of the operation
+ 12 - without FORCE file is left deleted (i.e. not restored) so new wd is
+      dirty (and warning message "D file" is printed), with FORCE, file is
+      restored.
+ 24 - This should be considered MAYBE SAFE since effectively it is 7 and 8
+      combined, but core git considers this a conflict unless forced.
+ 26 - This combines two cases (1 & 25) (and also implied 8 for tree content)
+      which are ok on their own, but core git treat this as a conflict.
+      If not forced, this is a conflict.  If forced, this actually doesn't
+      have to write anything and leaves the new blob as an untracked file.
+ 32 - This is the only case where the baseline and target values match
+      and yet we will still write to the working directory.  In all other
+      cases, if baseline == target, we don't touch the workdir (it is
+      either already right or is "dirty").  However, since this case also
+      implies that a ?/B1/x case will exist as well, it can be skipped.
+
+Cases 3, 17, 24, 26, and 29 are all considered conflicts even though
+none of them will require making any updates to the working directory.
+
diff --git a/include/git2/checkout.h b/include/git2/checkout.h
index 884ea27..12fffeb 100644
--- a/include/git2/checkout.h
+++ b/include/git2/checkout.h
@@ -23,97 +23,82 @@ GIT_BEGIN_DECL
 /**
  * Checkout behavior flags
  *
- * In libgit2, the function of checkout is to update the working directory
- * to match a target tree.  It does not move the HEAD commit - you do that
- * separately.  To safely perform the update, checkout relies on a baseline
- * tree (generally the current HEAD) as a reference for the unmodified
- * content expected in the working directory.
+ * In libgit2, checkout is used to update the working directory and index
+ * to match a target tree.  Unlike git checkout, it does not move the HEAD
+ * commit for you - use `git_repository_set_head` or the like to do that.
  *
- * Checkout examines the differences between the target tree, the baseline
- * tree and the working directory, and groups files into five categories:
+ * Checkout looks at (up to) four things: the "target" tree you want to
+ * check out, the "baseline" tree of what was checked out previously, the
+ * working directory for actual files, and the index for staged changes.
  *
- * 1. UNMODIFIED - Files that match in all places.
- * 2. SAFE - Files where the working directory and the baseline content
- *    match that can be safely updated to the target.
- * 3. DIRTY/MISSING - Files where the working directory differs from the
- *    baseline but there is no conflicting change with the target.  One
- *    example is a file that doesn't exist in the working directory - no
- *    data would be lost as a result of writing this file.  Which action
- *    will be taken with these files depends on the options you use.
- * 4. CONFLICTS - Files where changes in the working directory conflict
- *    with changes to be applied by the target.  If conflicts are found,
- *    they prevent any other modifications from being made (although there
- *    are options to override that and force the update, of course).
- * 5. UNTRACKED/IGNORED - Files in the working directory that are untracked
- *    or ignored (i.e. only in the working directory, not the other places).
+ * You give checkout one of four strategies for update:
  *
+ * - `GIT_CHECKOUT_NONE` is a dry-run strategy that checks for conflicts,
+ *   etc., but doesn't make any actual changes.
  *
- * You control the actions checkout takes with one of four base strategies:
+ * - `GIT_CHECKOUT_FORCE` is at the opposite extreme, taking any action to
+ *   make the working directory match the target (including potentially
+ *   discarding modified files).
  *
- * - `GIT_CHECKOUT_NONE` is the default and applies no changes. It is a dry
- *   run that you can use to find conflicts, etc. if you wish.
+ * In between those are `GIT_CHECKOUT_SAFE` and `GIT_CHECKOUT_SAFE_CREATE`
+ * both of which only make modifications that will not lose changes.
  *
- * - `GIT_CHECKOUT_SAFE` is like `git checkout` and only applies changes
- *   between the baseline and target trees to files in category 2.
+ *                      |  target == baseline   |  target != baseline  |
+ * ---------------------|-----------------------|----------------------|
+ *  workdir == baseline |       no action       |  create, update, or  |
+ *                      |                       |     delete file      |
+ * ---------------------|-----------------------|----------------------|
+ *  workdir exists and  |       no action       |   conflict (notify   |
+ *    is != baseline    | notify dirty MODIFIED | and cancel checkout) |
+ * ---------------------|-----------------------|----------------------|
+ *   workdir missing,   | create if SAFE_CREATE |     create file      |
+ *   baseline present   | notify dirty DELETED  |                      |
+ * ---------------------|-----------------------|----------------------|
  *
- * - `GIT_CHECKOUT_SAFE_CREATE` also creates files that are missing from the
- *   working directory (category 3), even if there is no change between the
- *   baseline and target trees for those files.  See notes below on
- *   emulating `git checkout-index` for some of the subtleties of this.
+ * The only difference between SAFE and SAFE_CREATE is that SAFE_CREATE
+ * will cause a file to be checked out if it is missing from the working
+ * directory even if it is not modified between the target and baseline.
  *
- * - `GIT_CHECKOUT_FORCE` is like `git checkout -f` and will update the
- *   working directory to match the target content regardless of conflicts,
- *   overwriting dirty and conflicting files.
  *
+ * To emulate `git checkout`, use `GIT_CHECKOUT_SAFE` with a checkout
+ * notification callback (see below) that displays information about dirty
+ * files.  The default behavior will cancel checkout on conflicts.
  *
- * There are some additional flags to modified the behavior of checkout:
+ * To emulate `git checkout-index`, use `GIT_CHECKOUT_SAFE_CREATE` with a
+ * notification callback that cancels the operation if a dirty-but-existing
+ * file is found in the working directory.  This core git command isn't
+ * quite "force" but is sensitive about some types of changes.
  *
- * - GIT_CHECKOUT_ALLOW_CONFLICTS can be added to apply safe file updates
- *   even if there are conflicts.  Normally, the entire checkout will be
- *   cancelled if any files are in category 4.  With this flag, conflicts
- *   will be skipped (though the notification callback will still be invoked
- *   on the conflicting files if requested).
+ * To emulate `git checkout -f`, use `GIT_CHECKOUT_FORCE`.
  *
- * - GIT_CHECKOUT_REMOVE_UNTRACKED means that files in the working directory
- *   that are untracked (but not ignored) should be deleted.  The are not
- *   considered conflicts and would normally be ignored by checkout.
+ * To emulate `git clone` use `GIT_CHECKOUT_SAFE_CREATE` in the options.
  *
- * - GIT_CHECKOUT_REMOVE_IGNORED means to remove ignored files from the
- *   working directory as well.  Obviously, these would normally be ignored.
  *
- * - GIT_CHECKOUT_UPDATE_ONLY means to only update the content of files that
- *   already exist.  Files will not be created nor deleted.  This does not
- *   make adds and deletes into conflicts - it just skips applying those
- *   changes.  This will also skip updates to typechanged files (since that
- *   would involve deleting the old and creating the new).
- *
- * - Unmerged entries in the index are also considered conflicts.  The
- *   GIT_CHECKOUT_SKIP_UNMERGED flag causes us to skip files with unmerged
- *   index entries.  You can also use GIT_CHECKOUT_USE_OURS and
- *   GIT_CHECKOUT_USE_THEIRS to proceeed with the checkout using either the
- *   stage 2 ("ours") or stage 3 ("theirs") version of files in the index.
+ * There are some additional flags to modified the behavior of checkout:
  *
+ * - GIT_CHECKOUT_ALLOW_CONFLICTS makes SAFE mode apply safe file updates
+ *   even if there are conflicts (instead of cancelling the checkout).
  *
- * To emulate `git checkout`, use `GIT_CHECKOUT_SAFE` with a checkout
- * notification callback (see below) that displays information about dirty
- * files (i.e. files that don't need an update but that no longer match the
- * baseline content).  The default behavior will cancel on conflicts.
+ * - GIT_CHECKOUT_REMOVE_UNTRACKED means remove untracked files (i.e. not
+ *   in target, baseline, or index, and not ignored) from the working dir.
  *
- * To emulate `git checkout-index`, use `GIT_CHECKOUT_SAFE_CREATE` with a
- * notification callback that cancels the operation if a dirty-but-existing
- * file is found in the working directory.  This core git command isn't
- * quite "force" but is sensitive about some types of changes.
+ * - GIT_CHECKOUT_REMOVE_IGNORED means remove ignored files (that are also
+ *   unrtacked) from the working directory as well.
+ *
+ * - GIT_CHECKOUT_UPDATE_ONLY means to only update the content of files that
+ *   already exist.  Files will not be created nor deleted.  This just skips
+ *   applying adds, deletes, and typechanges.
  *
- * To emulate `git checkout -f`, you use `GIT_CHECKOUT_FORCE`.
+ * - GIT_CHECKOUT_DONT_UPDATE_INDEX prevents checkout from writing the
+ *   updated files' information to the index.
  *
+ * - Normally, checkout will reload the index and git attributes from disk
+ *   before any operations.  GIT_CHECKOUT_NO_REFRESH prevents this reload.
  *
- * Checkout is "semi-atomic" as in it will go through the work to be done
- * before making any changes and if may decide to abort if there are
- * conflicts, or you can use the notification callback to explicitly abort
- * the action before any updates are made.  Despite this, if a second
- * process is modifying the filesystem while checkout is running, it can't
- * guarantee that the choices is makes while initially examining the
- * filesystem are still going to be correct as it applies them.
+ * - Unmerged index entries are conflicts.  GIT_CHECKOUT_SKIP_UNMERGED skips
+ *   files with unmerged index entries instead.  GIT_CHECKOUT_USE_OURS and
+ *   GIT_CHECKOUT_USE_THEIRS to proceeed with the checkout using either the
+ *   stage 2 ("ours") or stage 3 ("theirs") version of files in the index.
  */
 typedef enum {
 	GIT_CHECKOUT_NONE = 0, /** default is a dry run, no actual updates */
@@ -167,45 +152,23 @@ typedef enum {
 /**
  * Checkout notification flags
  *
- * When running a checkout, you can set a notification callback (`notify_cb`)
- * to be invoked for some or all files to be checked out.  Which files
- * receive a callback depend on the `notify_flags` value which is a
- * combination of these flags.
- *
- * - GIT_CHECKOUT_NOTIFY_CONFLICT means that conflicting files that would
- *   prevent the checkout from occurring will receive callbacks.  If you
- *   used GIT_CHECKOUT_ALLOW_CONFLICTS, the callbacks are still done, but
- *   the checkout will not be blocked.  The callback `status_flags` will
- *   have both index and work tree change bits set (see `git_status_t`).
- *
- * - GIT_CHECKOUT_NOTIFY_DIRTY means to notify about "dirty" files, i.e.
- *   those that do not need to be updated but no longer match the baseline
- *   content.  Core git displays these files when checkout runs, but does
- *   not stop the checkout.  For these,  `status_flags` will have only work
- *   tree bits set (i.e. GIT_STATUS_WT_MODIFIED, etc).
- *
- * - GIT_CHECKOUT_NOTIFY_UPDATED sends notification for any file changed by
- *   the checkout.  Callback `status_flags` will have only index bits set.
- *
- * - GIT_CHECKOUT_NOTIFY_UNTRACKED notifies for all untracked files that
- *   are not ignored.  Passing GIT_CHECKOUT_REMOVE_UNTRACKED would remove
- *   these files.  The `status_flags` will be GIT_STATUS_WT_NEW.
- *
- * - GIT_CHECKOUT_NOTIFY_IGNORED notifies for the ignored files.  Passing
- *   GIT_CHECKOUT_REMOVE_IGNORED will remove these.  The `status_flags`
- *   will be to GIT_STATUS_IGNORED.
- *
- * If you return a non-zero value from the notify callback, the checkout
- * will be canceled.  Notification callbacks are made prior to making any
- * modifications, so returning non-zero will cancel the entire checkout.
- * If you are do not use GIT_CHECKOUT_ALLOW_CONFLICTS and there are
- * conflicts, you don't need to explicitly cancel from the callback.
- * Checkout itself will abort after all files are processed.
- *
- * To emulate core git checkout output, use GIT_CHECKOUT_NOTIFY_CONFLICTS
- * and GIT_CHECKOUT_NOTIFY_DIRTY.  Conflicts will have `status_flags` with
- * changes in both the index and work tree (see the `git_status_t` values).
- * Dirty files will only have work tree flags set.
+ * Checkout will invoke an options notification callback (`notify_cb`) for
+ * certain cases - you pick which ones via `notify_flags`:
+ *
+ * - GIT_CHECKOUT_NOTIFY_CONFLICT invokes checkout on conflicting paths.
+ *
+ * - GIT_CHECKOUT_NOTIFY_DIRTY notifies about "dirty" files, i.e. those that
+ *   do not need an update but no longer match the baseline.  Core git
+ *   displays these files when checkout runs, but won't stop the checkout.
+ *
+ * - GIT_CHECKOUT_NOTIFY_UPDATED sends notification for any file changed.
+ *
+ * - GIT_CHECKOUT_NOTIFY_UNTRACKED notifies about untracked files.
+ *
+ * - GIT_CHECKOUT_NOTIFY_IGNORED notifies about ignored files.
+ *
+ * Returning a non-zero value from this callback will cancel the checkout.
+ * Notification callbacks are made prior to modifying any files on disk.
  */
 typedef enum {
 	GIT_CHECKOUT_NOTIFY_NONE      = 0,
@@ -216,13 +179,27 @@ typedef enum {
 	GIT_CHECKOUT_NOTIFY_IGNORED   = (1u << 4),
 } git_checkout_notify_t;
 
+/** Checkout notification callback function */
+typedef int (*git_checkout_notify_cb)(
+	git_checkout_notify_t why,
+	const char *path,
+	const git_diff_file *baseline,
+	const git_diff_file *target,
+	const git_diff_file *workdir,
+	void *payload);
+
+/** Checkout progress notification function */
+typedef void (*git_checkout_progress_cb)(
+	const char *path,
+	size_t completed_steps,
+	size_t total_steps,
+	void *payload);
+
 /**
  * Checkout options structure
  *
- * Use zeros to indicate default settings.
- *
- * This should be initialized with the `GIT_CHECKOUT_OPTS_INIT` macro to
- * correctly set the `version` field.
+ * Zero out for defaults.  Initialize with `GIT_CHECKOUT_OPTS_INIT` macro to
+ * correctly set the `version` field.  E.g.
  *
  *		git_checkout_opts opts = GIT_CHECKOUT_OPTS_INIT;
  */
@@ -237,21 +214,11 @@ typedef struct git_checkout_opts {
 	int file_open_flags;    /** default is O_CREAT | O_TRUNC | O_WRONLY */
 
 	unsigned int notify_flags; /** see `git_checkout_notify_t` above */
-	int (*notify_cb)(
-		git_checkout_notify_t why,
-		const char *path,
-		const git_diff_file *baseline,
-		const git_diff_file *target,
-		const git_diff_file *workdir,
-		void *payload);
+	git_checkout_notify_cb notify_cb;
 	void *notify_payload;
 
 	/* Optional callback to notify the consumer of checkout progress. */
-	void (*progress_cb)(
-		const char *path,
-		size_t completed_steps,
-		size_t total_steps,
-		void *payload);
+	git_checkout_progress_cb progress_cb;
 	void *progress_payload;
 
 	/** When not zeroed out, array of fnmatch patterns specifying which
diff --git a/src/checkout.c b/src/checkout.c
index a26f007..2e13294 100644
--- a/src/checkout.c
+++ b/src/checkout.c
@@ -24,157 +24,7 @@
 #include "diff.h"
 #include "pathspec.h"
 
-/* Key
- * ===
- * B1,B2,B3 - blobs with different SHAs,
- * Bi       - ignored blob (WD only)
- * T1,T2,T3 - trees with different SHAs,
- * Ti       - ignored tree (WD only)
- * x        - nothing
- */
-
-/* Diff with 2 non-workdir iterators
- * =================================
- *    Old New
- *    --- ---
- *  0   x   x - nothing
- *  1   x  B1 - added blob
- *  2   x  T1 - added tree
- *  3  B1   x - removed blob
- *  4  B1  B1 - unmodified blob
- *  5  B1  B2 - modified blob
- *  6  B1  T1 - typechange blob -> tree
- *  7  T1   x - removed tree
- *  8  T1  B1 - typechange tree -> blob
- *  9  T1  T1 - unmodified tree
- * 10  T1  T2 - modified tree (implies modified/added/removed blob inside)
- */
-
-/* Diff with non-work & workdir iterators
- * ======================================
- *    Old New-WD
- *    --- ------
- *  0   x   x - nothing
- *  1   x  B1 - added blob
- *  2   x  Bi - ignored file
- *  3   x  T1 - added tree
- *  4   x  Ti - ignored tree
- *  5  B1   x - removed blob
- *  6  B1  B1 - unmodified blob
- *  7  B1  B2 - modified blob
- *  8  B1  T1 - typechange blob -> tree
- *  9  B1  Ti - removed blob AND ignored tree as separate items
- * 10  T1   x - removed tree
- * 11  T1  B1 - typechange tree -> blob
- * 12  T1  Bi - removed tree AND ignored blob as separate items
- * 13  T1  T1 - unmodified tree
- * 14  T1  T2 - modified tree (implies modified/added/removed blob inside)
- *
- * If there is a corresponding blob in the old, Bi is irrelevant
- * If there is a corresponding tree in the old, Ti is irrelevant
- */
-
-/* Checkout From 3 Iterators (2 not workdir, 1 workdir)
- * ====================================================
- *
- * (Expect == Old HEAD / Desire == What To Checkout / Actual == Workdir)
- *
- *    Expect Desire Actual-WD
- *    ------ ------ ------
- *  0      x      x      x - nothing
- *  1      x      x B1/Bi/T1/Ti - untracked/ignored blob/tree (SAFE)
- *  2+     x     B1      x - add blob (SAFE)
- *  3      x     B1     B1 - independently added blob (FORCEABLE-2)
- *  4*     x     B1 B2/Bi/T1/Ti - add blob with content conflict (FORCEABLE-2)
- *  5+     x     T1      x - add tree (SAFE)
- *  6*     x     T1  B1/Bi - add tree with blob conflict (FORCEABLE-2)
- *  7      x     T1   T1/i - independently added tree (SAFE+MISSING)
- *  8     B1      x      x - independently deleted blob (SAFE+MISSING)
- *  9-    B1      x     B1 - delete blob (SAFE)
- * 10-    B1      x     B2 - delete of modified blob (FORCEABLE-1)
- * 11     B1      x  T1/Ti - independently deleted blob AND untrack/ign tree (SAFE+MISSING !!!)
- * 12     B1     B1      x - locally deleted blob (DIRTY || SAFE+CREATE)
- * 13+    B1     B2      x - update to deleted blob (SAFE+MISSING)
- * 14     B1     B1     B1 - unmodified file (SAFE)
- * 15     B1     B1     B2 - locally modified file (DIRTY)
- * 16+    B1     B2     B1 - update unmodified blob (SAFE)
- * 17     B1     B2     B2 - independently updated blob (FORCEABLE-1)
- * 18+    B1     B2     B3 - update to modified blob (FORCEABLE-1)
- * 19     B1     B1  T1/Ti - locally deleted blob AND untrack/ign tree (DIRTY)
- * 20*    B1     B2  T1/Ti - update to deleted blob AND untrack/ign tree (F-1)
- * 21+    B1     T1      x - add tree with locally deleted blob (SAFE+MISSING)
- * 22*    B1     T1     B1 - add tree AND deleted blob (SAFE)
- * 23*    B1     T1     B2 - add tree with delete of modified blob (F-1)
- * 24     B1     T1     T1 - add tree with deleted blob (F-1)
- * 25     T1      x      x - independently deleted tree (SAFE+MISSING)
- * 26     T1      x  B1/Bi - independently deleted tree AND untrack/ign blob (F-1)
- * 27-    T1      x     T1 - deleted tree (MAYBE SAFE)
- * 28+    T1     B1      x - deleted tree AND added blob (SAFE+MISSING)
- * 29     T1     B1     B1 - independently typechanged tree -> blob (F-1)
- * 30+    T1     B1     B2 - typechange tree->blob with conflicting blob (F-1)
- * 31*    T1     B1  T1/T2 - typechange tree->blob (MAYBE SAFE)
- * 32+    T1     T1      x - restore locally deleted tree (SAFE+MISSING)
- * 33     T1     T1  B1/Bi - locally typechange tree->untrack/ign blob (DIRTY)
- * 34     T1     T1  T1/T2 - unmodified tree (MAYBE SAFE)
- * 35+    T1     T2      x - update locally deleted tree (SAFE+MISSING)
- * 36*    T1     T2  B1/Bi - update to tree with typechanged tree->blob conflict (F-1)
- * 37     T1     T2 T1/T2/T3 - update to existing tree (MAYBE SAFE)
- *
- * The number will be followed by ' ' if no change is needed or '+' if the
- * case needs to write to disk or '-' if something must be deleted and '*'
- * if there should be a delete followed by an write.
- *
- * There are four tiers of safe cases:
- * - SAFE         == completely safe to update
- * - SAFE+MISSING == safe except the workdir is missing the expect content
- * - MAYBE SAFE   == safe if workdir tree matches (or is missing) baseline
- *                   content, which is unknown at this point
- * - FORCEABLE == conflict unless FORCE is given
- * - DIRTY     == no conflict but change is not applied unless FORCE
- *
- * Some slightly unusual circumstances:
- *  8 - parent dir is only deleted when file is, so parent will be left if
- *      empty even though it would be deleted if the file were present
- * 11 - core git does not consider this a conflict but attempts to delete T1
- *      and gives "unable to unlink file" error yet does not skip the rest
- *      of the operation
- * 12 - without FORCE file is left deleted (i.e. not restored) so new wd is
- *      dirty (and warning message "D file" is printed), with FORCE, file is
- *      restored.
- * 24 - This should be considered MAYBE SAFE since effectively it is 7 and 8
- *      combined, but core git considers this a conflict unless forced.
- * 26 - This combines two cases (1 & 25) (and also implied 8 for tree content)
- *      which are ok on their own, but core git treat this as a conflict.
- *      If not forced, this is a conflict.  If forced, this actually doesn't
- *      have to write anything and leaves the new blob as an untracked file.
- * 32 - This is the only case where the baseline and target values match
- *      and yet we will still write to the working directory.  In all other
- *      cases, if baseline == target, we don't touch the workdir (it is
- *      either already right or is "dirty").  However, since this case also
- *      implies that a ?/B1/x case will exist as well, it can be skipped.
- *
- * Cases 3, 17, 24, 26, and 29 are all considered conflicts even though
- * none of them will require making any updates to the working directory.
- */
-
-/*    expect desire  wd
- *  1    x      x     T -> ignored dir OR untracked dir OR parent dir
- *  2    x      x     I -> ignored file
- *  3    x      x     A -> untracked file
- *  4    x      A     x -> add from index (no conflict)
- *  5    x      A     A -> independently added file
- *  6    x      A     B -> add with conflicting file
- *  7    A      x     x -> independently deleted file
- *  8    A      x     A -> delete from index (no conflict)
- *  9    A      x     B -> delete of modified file
- * 10    A      A     x -> locally deleted file
- * 11    A      A     A -> unmodified file (no conflict)
- * 12    A      A     B -> locally modified
- * 13    A      B     x -> update of deleted file
- * 14    A      B     A -> update of unmodified file (no conflict)
- * 15    A      B     B -> independently updated file
- * 16    A      B     C -> update of modified file
- */
+/* See docs/checkout-internals.md for more information */
 
 enum {
 	CHECKOUT_ACTION__NONE = 0,
@@ -1317,34 +1167,15 @@ int git_checkout_iterator(
 			goto cleanup;
 	}
 
-	/* Checkout can be driven either off a target-to-workdir diff or a
-	 * baseline-to-target diff.  There are pros and cons of each.
-	 *
-	 * Target-to-workdir means the diff includes every file that could be
-	 * modified, which simplifies bookkeeping, but the code to constantly
-	 * refer back to the baseline gets complicated.
-	 *
-	 * Baseline-to-target has simpler code because the diff defines the
-	 * action to take, but needs special handling for untracked and ignored
-	 * files, if they need to be removed.
-	 *
-	 * I've implemented both versions and opted for the second.
+	/* Generate baseline-to-target diff which will include an entry for
+	 * every possible update that might need to be made.
 	 */
 	if ((error = git_diff__from_iterators(
 			&data.diff, data.repo, baseline, target, &diff_opts)) < 0)
 		goto cleanup;
 
-	/* In order to detect conflicts prior to performing any operations,
-	 * and in order to deal with some order dependencies, checkout is best
-	 * performed with up to four passes through the diff.
-	 *
-	 * 0. Figure out the actions to be taken,
-	 * 1. Remove any files / directories as needed (because alphabetical
-	 *    iteration means that an untracked directory will end up sorted
-	 *    *after* a blob that should be checked out with the same name),
-	 * 2. Then update all blobs,
-	 * 3. Then update all submodules in case a new .gitmodules blob was
-	 *    checked out during pass #2.
+	/* Loop through diff (and working directory iterator) building a list of
+	 * actions to be taken, plus look for conflicts and send notifications.
 	 */
 	if ((error = checkout_get_actions(&actions, &counts, &data, workdir)) < 0)
 		goto cleanup;
@@ -1355,8 +1186,9 @@ int git_checkout_iterator(
 
 	report_progress(&data, NULL); /* establish 0 baseline */
 
-	/* TODO: add ability to update index entries while checking out */
-
+	/* To deal with some order dependencies, perform remaining checkout
+	 * in three passes: removes, then update blobs, then update submodules.
+	 */
 	if (counts[CHECKOUT_ACTION__REMOVE] > 0 &&
 		(error = checkout_remove_the_old(actions, &data)) < 0)
 		goto cleanup;