|
5e5848eb
|
2013-02-14T17:25:10
|
|
Change similarity metric to sampled hashes
This moves the similarity metric code out of buf_text and into a
new file. Also, this implements a different approach to similarity
measurement based on a Rabin-Karp rolling hash where we only keep
the top 100 and bottom 100 hashes. In theory, that should be
sufficient samples to given a fairly accurate measurement while
limiting the amount of data we keep for file signatures no matter
how large the file is.
|
|
9c454b00
|
2013-01-11T22:13:02
|
|
Initial implementation of similarity scoring algo
This adds a new `git_buf_text_hashsig` type and functions to
generate these hash signatures and compare them to give a
similarity score. This can be plugged into diff similarity
scoring.
|
|
17c92bea
|
2013-01-29T12:13:24
|
|
Test buf join with NULL behavior explicitly
|
|
0d65acad
|
2013-01-11T11:24:26
|
|
Match binary file check of core git in diff
Core git just looks for NUL bytes in files when deciding about
is-binary inside diff (although it uses a better algorithm in
checkout, when deciding if CRLF conversion should be done).
Libgit2 was using the better algorithm in both places, but that
is causing some confusion. For now, this makes diff just look
for NUL bytes to decide if a file is binary by content in diff.
|
|
7bf87ab6
|
2012-11-28T09:58:48
|
|
Consolidate text buffer functions
There are many scattered functions that look into the contents of
buffers to do various text manipulations (such as escaping or
unescaping data, calculating text stats, guessing if content is
binary, etc). This groups all those functions together into a
new file and converts the code to use that.
This has two enhancements to existing functionality. The old
text stats function is significantly rewritten and the BOM
detection code was extended (although largely we can't deal with
anything other than a UTF8 BOM).
|
|
2d3579be
|
2012-10-10T14:54:31
|
|
Add git_buf_put_base64 to buffer API
|
|
e9ca852e
|
2012-08-23T09:20:17
|
|
Fix warnings and merge issues on Win64
|
|
02a0d651
|
2012-07-12T16:31:59
|
|
Add git_buf_unescape and git__unescape to unescape all characters in a string (in-place)
|
|
465092ce
|
2012-07-12T11:56:50
|
|
Fix memory leak in test
|
|
039fc406
|
2012-07-10T15:10:14
|
|
Add a couple of useful git_buf utilities
* `git_buf_rfind` (with tests and tests for `git_buf_rfind_next`)
* `git_buf_puts_escaped` and `git_buf_puts_escaped_regex` (with tests)
to copy strings into a buffer while injecting an escape sequence
(e.g. '\') in front of particular characters.
|
|
41a82592
|
2012-05-15T14:17:39
|
|
Ranged iterators and rewritten git_status_file
The goal of this work is to rewrite git_status_file to use the
same underlying code as git_status_foreach.
This is done in 3 phases:
1. Extend iterators to allow ranged iteration with start and
end prefixes for the range of file names to be covered.
2. Improve diff so that when there is a pathspec and there is
a common non-wildcard prefix of the pathspec, it will use
ranged iterators to minimize excess iteration.
3. Rewrite git_status_file to call git_status_foreach_ext
with a pathspec that covers just the one file being checked.
Since ranged iterators underlie the status & diff implementation,
this is actually fairly efficient. The workdir iterator does
end up loading the contents of all the directories down to the
single file, which should ideally be avoided, but it is pretty
good.
|
|
1a6e8f8a
|
2012-04-13T10:42:00
|
|
Update clar and remove old helpers
This updates to the latest clar which includes the helpers
`cl_assert_equal_s` and `cl_assert_equal_i`. Convert the code
over to use those and remove the old libgit2-only helpers.
|
|
a4c291ef
|
2012-03-20T21:57:38
|
|
Convert reflog to new errors
Cleaned up some other issues.
|
|
13224ea4
|
2012-02-27T04:28:31
|
|
buffer: Unify `git_fbuffer` and `git_buf`
This makes so much sense that I can't believe it hasn't been done
before. Kill the old `git_fbuffer` and read files straight into
`git_buf` objects.
Also: In order to fully support 4GB files in 32-bit systems, the
`git_buf` implementation has been changed from using `ssize_t` for
storage and storing negative values on allocation failure, to using
`size_t` and changing the buffer pointer to a magical pointer on
allocation failure.
Hopefully this won't break anything.
|
|
3fd1520c
|
2012-01-24T20:35:15
|
|
Rename the Clay test suite to Clar
Clay is the name of a programming language on the makings, and we want
to avoid confusions. Sorry for the huge diff!
|