• Show log

    Commit

  • Hash : 5e5848eb
    Author : Russell Belfer
    Date : 2013-02-14T17:25:10

    Change similarity metric to sampled hashes
    
    This moves the similarity metric code out of buf_text and into a
    new file.  Also, this implements a different approach to similarity
    measurement based on a Rabin-Karp rolling hash where we only keep
    the top 100 and bottom 100 hashes.  In theory, that should be
    sufficient samples to given a fairly accurate measurement while
    limiting the amount of data we keep for file signatures no matter
    how large the file is.