• Show log

    Commit

  • Hash : 19f1a8e6
    Author : Michael Haggerty
    Date : 2016-09-05T11:44:51

    diff: improve positioning of add/delete blocks in diffs
    
    Some groups of added/deleted lines in diffs can be slid up or down,
    because lines at the edges of the group are not unique. Picking good
    shifts for such groups is not a matter of correctness but definitely has
    a big effect on aesthetics. For example, consider the following two
    diffs. The first is what standard Git emits:
    
        --- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl
        +++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl
        @@ -231,6 +231,9 @@ if (!defined $initial_reply_to && $prompting) {
         }
    
         if (!$smtp_server) {
        +       $smtp_server = $repo->config('sendemail.smtpserver');
        +}
        +if (!$smtp_server) {
                foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
                        if (-x $_) {
                                $smtp_server = $_;
    
    The following diff is equivalent, but is obviously preferable from an
    aesthetic point of view:
    
        --- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl
        +++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl
        @@ -230,6 +230,9 @@ if (!defined $initial_reply_to && $prompting) {
                $initial_reply_to =~ s/(^\s+|\s+$)//g;
         }
    
        +if (!$smtp_server) {
        +       $smtp_server = $repo->config('sendemail.smtpserver');
        +}
         if (!$smtp_server) {
                foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
                        if (-x $_) {
    
    This patch teaches Git to pick better positions for such "diff sliders"
    using heuristics that take the positions of nearby blank lines and the
    indentation of nearby lines into account.
    
    The existing Git code basically always shifts such "sliders" as far down
    in the file as possible. The only exception is when the slider can be
    aligned with a group of changed lines in the other file, in which case
    Git favors depicting the change as one add+delete block rather than one
    add and a slightly offset delete block. This naive algorithm often
    yields ugly diffs.
    
    Commit d634d61ed6 improved the situation somewhat by preferring to
    position add/delete groups to make their last line a blank line, when
    that is possible. This heuristic does more good than harm, but (1) it
    can only help if there are blank lines in the right places, and (2)
    always picks the last blank line, even if there are others that might be
    better. The end result is that it makes perhaps 1/3 as many errors as
    the default Git algorithm, but that still leaves a lot of ugly diffs.
    
    This commit implements a new and much better heuristic for picking
    optimal "slider" positions using the following approach: First observe
    that each hypothetical positioning of a diff slider introduces two
    splits: one between the context lines preceding the group and the first
    added/deleted line, and the other between the last added/deleted line
    and the first line of context following it. It tries to find the
    positioning that creates the least bad splits.
    
    Splits are evaluated based only on the presence and locations of nearby
    blank lines, and the indentation of lines near the split. Basically, it
    prefers to introduce splits adjacent to blank lines, between lines that
    are indented less, and between lines with the same level of indentation.
    In more detail:
    
    1. It measures the following characteristics of a proposed splitting
       position in a `struct split_measurement`:
    
       * the number of blank lines above the proposed split
       * whether the line directly after the split is blank
       * the number of blank lines following that line
       * the indentation of the nearest non-blank line above the split
       * the indentation of the line directly below the split
       * the indentation of the nearest non-blank line after that line
    
    2. It combines the measured attributes using a bunch of
       empirically-optimized weighting factors to derive a `struct
       split_score` that measures the "badness" of splitting the text at
       that position.
    
    3. It combines the `split_score` for the top and the bottom of the
       slider at each of its possible positions, and selects the position
       that has the best `split_score`.
    
    I determined the initial set of weighting factors by collecting a corpus
    of Git histories from 29 open-source software projects in various
    programming languages. I generated many diffs from this corpus, and
    determined the best positioning "by eye" for about 6600 diff sliders. I
    used about half of the repositories in the corpus (corresponding to
    about 2/3 of the sliders) as a training set, and optimized the weights
    against this corpus using a crude automated search of the parameter
    space to get the best agreement with the manually-determined values.
    Then I tested the resulting heuristic against the full corpus. The
    results are summarized in the following table, in column `indent-1`:
    
    | repository            | count |      Git 2.9.0 |     compaction | compaction-fixed |       indent-1 |       indent-2 |
    | --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- |
    | afnetworking          |   109 |    89  (81.7%) |    37  (33.9%) |      37  (33.9%) |     2   (1.8%) |     2   (1.8%) |
    | alamofire             |    30 |    18  (60.0%) |    14  (46.7%) |      15  (50.0%) |     0   (0.0%) |     0   (0.0%) |
    | angular               |   184 |   127  (69.0%) |    39  (21.2%) |      23  (12.5%) |     5   (2.7%) |     5   (2.7%) |
    | animate               |   313 |     2   (0.6%) |     2   (0.6%) |       2   (0.6%) |     2   (0.6%) |     2   (0.6%) |
    | ant                   |   380 |   356  (93.7%) |   152  (40.0%) |     148  (38.9%) |    15   (3.9%) |    15   (3.9%) | *
    | bugzilla              |   306 |   263  (85.9%) |   109  (35.6%) |      99  (32.4%) |    14   (4.6%) |    15   (4.9%) | *
    | corefx                |   126 |    91  (72.2%) |    22  (17.5%) |      21  (16.7%) |     6   (4.8%) |     6   (4.8%) |
    | couchdb               |    78 |    44  (56.4%) |    26  (33.3%) |      28  (35.9%) |     6   (7.7%) |     6   (7.7%) | *
    | cpython               |   937 |   158  (16.9%) |    50   (5.3%) |      49   (5.2%) |     5   (0.5%) |     5   (0.5%) | *
    | discourse             |   160 |    95  (59.4%) |    42  (26.2%) |      36  (22.5%) |    18  (11.2%) |    13   (8.1%) |
    | docker                |   307 |   194  (63.2%) |   198  (64.5%) |     253  (82.4%) |     8   (2.6%) |     8   (2.6%) | *
    | electron              |   163 |   132  (81.0%) |    38  (23.3%) |      39  (23.9%) |     6   (3.7%) |     6   (3.7%) |
    | git                   |   536 |   470  (87.7%) |    73  (13.6%) |      78  (14.6%) |    16   (3.0%) |    16   (3.0%) | *
    | gitflow               |   127 |     0   (0.0%) |     0   (0.0%) |       0   (0.0%) |     0   (0.0%) |     0   (0.0%) |
    | ionic                 |   133 |    89  (66.9%) |    29  (21.8%) |      38  (28.6%) |     1   (0.8%) |     1   (0.8%) |
    | ipython               |   482 |   362  (75.1%) |   167  (34.6%) |     169  (35.1%) |    11   (2.3%) |    11   (2.3%) | *
    | junit                 |   161 |   147  (91.3%) |    67  (41.6%) |      66  (41.0%) |     1   (0.6%) |     1   (0.6%) | *
    | lighttable            |    15 |     5  (33.3%) |     0   (0.0%) |       2  (13.3%) |     0   (0.0%) |     0   (0.0%) |
    | magit                 |    88 |    75  (85.2%) |    11  (12.5%) |       9  (10.2%) |     1   (1.1%) |     0   (0.0%) |
    | neural-style          |    28 |     0   (0.0%) |     0   (0.0%) |       0   (0.0%) |     0   (0.0%) |     0   (0.0%) |
    | nodejs                |   781 |   649  (83.1%) |   118  (15.1%) |     111  (14.2%) |     4   (0.5%) |     5   (0.6%) | *
    | phpmyadmin            |   491 |   481  (98.0%) |    75  (15.3%) |      48   (9.8%) |     2   (0.4%) |     2   (0.4%) | *
    | react-native          |   168 |   130  (77.4%) |    79  (47.0%) |      81  (48.2%) |     0   (0.0%) |     0   (0.0%) |
    | rust                  |   171 |   128  (74.9%) |    30  (17.5%) |      27  (15.8%) |    16   (9.4%) |    14   (8.2%) |
    | spark                 |   186 |   149  (80.1%) |    52  (28.0%) |      52  (28.0%) |     2   (1.1%) |     2   (1.1%) |
    | tensorflow            |   115 |    66  (57.4%) |    48  (41.7%) |      48  (41.7%) |     5   (4.3%) |     5   (4.3%) |
    | test-more             |    19 |    15  (78.9%) |     2  (10.5%) |       2  (10.5%) |     1   (5.3%) |     1   (5.3%) | *
    | test-unit             |    51 |    34  (66.7%) |    14  (27.5%) |       8  (15.7%) |     2   (3.9%) |     2   (3.9%) | *
    | xmonad                |    23 |    22  (95.7%) |     2   (8.7%) |       2   (8.7%) |     1   (4.3%) |     1   (4.3%) | *
    | --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- |
    | totals                |  6668 |  4391  (65.9%) |  1496  (22.4%) |    1491  (22.4%) |   150   (2.2%) |   144   (2.2%) |
    | totals (training set) |  4552 |  3195  (70.2%) |  1053  (23.1%) |    1061  (23.3%) |    86   (1.9%) |    88   (1.9%) |
    | totals (test set)     |  2116 |  1196  (56.5%) |   443  (20.9%) |     430  (20.3%) |    64   (3.0%) |    56   (2.6%) |
    
    In this table, the numbers are the count and percentage of human-rated
    sliders that the corresponding algorithm got *wrong*. The columns are
    
    * "repository" - the name of the repository used. I used the diffs
      between successive non-merge commits on the HEAD branch of the
      corresponding repository.
    
    * "count" - the number of sliders that were human-rated. I chose most,
      but not all, sliders to rate from those among which the various
      algorithms gave different answers.
    
    * "Git 2.9.0" - the default algorithm used by `git diff` in Git 2.9.0.
    
    * "compaction" - the heuristic used by `git diff --compaction-heuristic`
      in Git 2.9.0.
    
    * "compaction-fixed" - the heuristic used by `git diff
      --compaction-heuristic` after the fixes from earlier in this patch
      series. Note that the results are not dramatically different than
      those for "compaction". Both produce non-ideal diffs only about 1/3 as
      often as the default `git diff`.
    
    * "indent-1" - the new `--indent-heuristic` algorithm, using the first
      set of weighting factors, determined as described above.
    
    * "indent-2" - the new `--indent-heuristic` algorithm, using the final
      set of weighting factors, determined as described below.
    
    * `*` - indicates that repo was part of training set used to determine
      the first set of weighting factors.
    
    The fact that the heuristic performed nearly as well on the test set as
    on the training set in column "indent-1" is a good indication that the
    heuristic was not over-trained. Given that fact, I ran a second round of
    optimization, using the entire corpus as the training set. The resulting
    set of weights gave the results in column "indent-2". These are the
    weights included in this patch.
    
    The final result gives consistently and significantly better results
    across the whole corpus than either `git diff` or `git diff
    --compaction-heuristic`. It makes only about 1/30 as many errors as the
    former and about 1/10 as many errors as the latter. (And a good fraction
    of the remaining errors are for diffs that involve weirdly-formatted
    code, sometimes apparently machine-generated.)
    
    The tools that were used to do this optimization and analysis, along
    with the human-generated data values, are recorded in a separate project
    [1].
    
    [1] https://github.com/mhagger/diff-slider-tools
    
    Original Git commit: 433860f3d0beb0c6f205290bd16cda413148f098
    

  • README.md

  • libgit2 - the Git linkable library

    Travis Build Status AppVeyor Build Status Coverity Scan Build Status

    libgit2 is a portable, pure C implementation of the Git core methods provided as a re-entrant linkable library with a solid API, allowing you to write native speed custom Git applications in any language with bindings.

    libgit2 is licensed under a very permissive license (GPLv2 with a special Linking Exception). This basically means that you can link it (unmodified) with any kind of software without having to release its source code. Additionally, the example code has been released to the public domain (see the separate license for more information).

    Getting Help

    Join us on Slack

    Visit slack.libgit2.org to sign up, then join us in #libgit2. If you prefer IRC, you can also point your client to our slack channel once you’ve registered.

    Getting Help

    If you have questions about the library, please be sure to check out the API documentation. If you still have questions, reach out to us on Slack or post a question on StackOverflow (with the libgit2 tag).

    Reporting Bugs

    Please open a GitHub Issue and include as much information as possible. If possible, provide sample code that illustrates the problem you’re seeing. If you’re seeing a bug only on a specific repository, please provide a link to it if possible.

    We ask that you not open a GitHub Issue for help, only for bug reports.

    What It Can Do

    libgit2 is already very usable and is being used in production for many applications including the GitHub.com site, in Plastic SCM and also powering Microsoft’s Visual Studio tools for Git. The library provides:

    • SHA conversions, formatting and shortening
    • abstracted ODB backend system
    • commit, tag, tree and blob parsing, editing, and write-back
    • tree traversal
    • revision walking
    • index file (staging area) manipulation
    • reference management (including packed references)
    • config file management
    • high level repository management
    • thread safety and reentrancy
    • descriptive and detailed error messages
    • …and more (over 175 different API calls)

    Optional dependencies

    While the library provides git functionality without the need for dependencies, it can make use of a few libraries to add to it:

    • pthreads (non-Windows) to enable threadsafe access as well as multi-threaded pack generation
    • OpenSSL (non-Windows) to talk over HTTPS and provide the SHA-1 functions
    • LibSSH2 to enable the SSH transport
    • iconv (OSX) to handle the HFS+ path encoding peculiarities

    Initialization

    The library needs to keep track of some global state. Call

    git_libgit2_init();

    before calling any other libgit2 functions. You can call this function many times. A matching number of calls to

    git_libgit2_shutdown();

    will free the resources. Note that if you have worker threads, you should call git_libgit2_shutdown after those threads have exited. If you require assistance coordinating this, simply have the worker threads call git_libgit2_init at startup and git_libgit2_shutdown at shutdown.

    Threading

    See THREADING for information

    Conventions

    See CONVENTIONS for an overview of the external and internal API/coding conventions we use.

    Building libgit2 - Using CMake

    libgit2 builds cleanly on most platforms without any external dependencies. Under Unix-like systems, like Linux, *BSD and Mac OS X, libgit2 expects pthreads to be available; they should be installed by default on all systems. Under Windows, libgit2 uses the native Windows API for threading.

    The libgit2 library is built using CMake (version 2.8 or newer) on all platforms.

    On most systems you can build the library using the following commands

    $ mkdir build && cd build
    $ cmake ..
    $ cmake --build .

    Alternatively you can point the CMake GUI tool to the CMakeLists.txt file and generate platform specific build project or IDE workspace.

    To install the library you can specify the install prefix by setting:

    $ cmake .. -DCMAKE_INSTALL_PREFIX=/install/prefix
    $ cmake --build . --target install

    For more advanced use or questions about CMake please read https://cmake.org/Wiki/CMake_FAQ.

    The following CMake variables are declared:

    • BIN_INSTALL_DIR: Where to install binaries to.
    • LIB_INSTALL_DIR: Where to install libraries to.
    • INCLUDE_INSTALL_DIR: Where to install headers to.
    • BUILD_SHARED_LIBS: Build libgit2 as a Shared Library (defaults to ON)
    • BUILD_CLAR: Build Clar-based test suite (defaults to ON)
    • THREADSAFE: Build libgit2 with threading support (defaults to ON)
    • STDCALL: Build libgit2 as stdcall. Turn off for cdecl (Windows; defaults to ON)

    Compiler and linker options

    CMake lets you specify a few variables to control the behavior of the compiler and linker. These flags are rarely used but can be useful for 64-bit to 32-bit cross-compilation.

    • CMAKE_C_FLAGS: Set your own compiler flags
    • CMAKE_FIND_ROOT_PATH: Override the search path for libraries
    • ZLIB_LIBRARY, OPENSSL_SSL_LIBRARY AND OPENSSL_CRYPTO_LIBRARY: Tell CMake where to find those specific libraries

    MacOS X

    If you want to build a universal binary for Mac OS X, CMake sets it all up for you if you use -DCMAKE_OSX_ARCHITECTURES="i386;x86_64" when configuring.

    Windows

    You need to run the CMake commands from the Visual Studio command prompt, not the regular or Windows SDK one. Select the right generator for your version with the -G "Visual Studio X" option. See [the website](http://libgit2.github.com/docs/guides/build-and-link/) for more detailed instructions. Android ------- Extract toolchain from NDK using,make-standalone-toolchain.shscript. Optionally, crosscompile and install OpenSSL inside of it. Then create CMake toolchain file that configures paths to your crosscompiler (substitute{PATH}with full path to the toolchain): SET(CMAKE_SYSTEM_NAME Linux) SET(CMAKE_SYSTEM_VERSION Android) SET(CMAKE_C_COMPILER {PATH}/bin/arm-linux-androideabi-gcc) SET(CMAKE_CXX_COMPILER {PATH}/bin/arm-linux-androideabi-g++) SET(CMAKE_FIND_ROOT_PATH {PATH}/sysroot/) SET(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER) SET(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY) SET(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY) Add-DCMAKE_TOOLCHAIN_FILE={pathToToolchainFile}to cmake command when configuring. Language Bindings ================================== Here are the bindings to libgit2 that are currently available: * C++ * libqgit2, Qt bindings <https://projects.kde.org/projects/playground/libs/libqgit2/repository/> * Chicken Scheme * chicken-git <https://wiki.call-cc.org/egg/git> * D * dlibgit <https://github.com/s-ludwig/dlibgit> * Delphi * GitForDelphi <https://github.com/libgit2/GitForDelphi> * Erlang * Geef <https://github.com/carlosmn/geef> * Go * git2go <https://github.com/libgit2/git2go> * GObject * libgit2-glib <https://wiki.gnome.org/Projects/Libgit2-glib> * Haskell * hgit2 <https://github.com/jwiegley/gitlib> * Java * Jagged <https://github.com/ethomson/jagged> * Julia * LibGit2.jl <https://github.com/jakebolewski/LibGit2.jl> * Lua * luagit2 <https://github.com/libgit2/luagit2> * .NET * libgit2sharp <https://github.com/libgit2/libgit2sharp> * Node.js * nodegit <https://github.com/nodegit/nodegit> * Objective-C * objective-git <https://github.com/libgit2/objective-git> * OCaml * ocaml-libgit2 <https://github.com/fxfactorial/ocaml-libgit2> * Parrot Virtual Machine * parrot-libgit2 <https://github.com/letolabs/parrot-libgit2> * Perl * Git-Raw <https://github.com/jacquesg/p5-Git-Raw> * PHP * php-git <https://github.com/libgit2/php-git> * PowerShell * PSGit <https://github.com/PoshCode/PSGit> * Python * pygit2 <https://github.com/libgit2/pygit2> * R * git2r <https://github.com/ropensci/git2r> * Ruby * Rugged <https://github.com/libgit2/rugged> * Rust * git2-rs <https://github.com/alexcrichton/git2-rs> * Swift * Gift <https://github.com/modocache/Gift> * Vala * libgit2.vapi <https://github.com/apmasell/vapis/blob/master/libgit2.vapi> If you start another language binding to libgit2, please let us know so we can add it to the list. How Can I Contribute? ================================== We welcome new contributors! We have a number of issues marked as ["up for grabs"](https://github.com/libgit2/libgit2/issues?q=is%3Aissue+is%3Aopen+label%3A%22up+for+grabs%22) and ["easy fix"](https://github.com/libgit2/libgit2/issues?utf8=✓&q=is%3Aissue+is%3Aopen+label%3A%22easy+fix%22) that are good places to jump in and get started. There's much more detailed information in our list of [outstanding projects](PROJECTS.md). Please be sure to check the [contribution guidelines](CONTRIBUTING.md) to understand our workflow, and the libgit2 [coding conventions](CONVENTIONS.md). License ==================================libgit2` is under GPL2 with linking exception. This means you can link to and use the library from any program, proprietary or open source; paid or gratis. However, if you modify libgit2 itself, you must distribute the source to your modified version of libgit2. See the COPYING file for the full license text.