• Show log

    Commit

  • Hash : d730d3f4
    Author : Russell Belfer
    Date : 2013-07-31T16:40:42

    Major rename detection changes
    
    After doing further profiling, I found that a lot of time was
    being spent attempting to insert hashes into the file hash
    signature when using the rolling hash because the rolling hash
    approach generates a hash per byte of the file instead of one
    per run/line of data.
    
    To optimize this, I decided to convert back to a run-based file
    signature algorithm which would be more like core Git.
    
    After changing this, a number of the existing tests started to
    fail.  In some cases, this appears to have been because the test
    was coded to be too specific to the particular results of the file
    similarity metric and in some cases there appear to have been bugs
    in the core rename detection code where only by the coincidence
    of the file similarity scoring were the expected results being
    generated.
    
    This renames all the variables in the core rename detection code
    to be more consistent and hopefully easier to follow which made it
    a bit easier to reason about the behavior of that code and fix the
    problems that I was seeing.  I think it's in better shape now.
    
    There are a couple of tests now that attempt to stress test the
    rename detection code and they are quite slow.  Most of the time
    is spent setting up the test data on disk and in the index.  When
    we roll out performance improvements for index insertion, it
    should also speed up these tests I hope.