kmx git

Commit	Date	Message
5e5848eb	2013-02-14T17:25:10	Change similarity metric to sampled hashes This moves the similarity metric code out of buf_text and into a new file. Also, this implements a different approach to similarity measurement based on a Rabin-Karp rolling hash where we only keep the top 100 and bottom 100 hashes. In theory, that should be sufficient samples to given a fairly accurate measurement while limiting the amount of data we keep for file signatures no matter how large the file is.
f3327cac	2013-01-13T10:06:09	Some similarity metric adjustments This makes the text similarity metric treat \r as equivalent to \n and makes it skip whitespace immediately following a line terminator, so line indentation will have less effect on the difference measurement (and so \r\n will be treated as just a single line terminator). This also separates the text and binary hash calculators into two separate functions instead of have more if statements inside the loop. This should make it easier to have more differentiated heuristics in the future if we so wish.
9c454b00	2013-01-11T22:13:02	Initial implementation of similarity scoring algo This adds a new `git_buf_text_hashsig` type and functions to generate these hash signatures and compare them to give a similarity score. This can be plugged into diff similarity scoring.
355dddbf	2013-01-12T01:40:35	buf: Is this the function you were looking for?
0d65acad	2013-01-11T11:24:26	Match binary file check of core git in diff Core git just looks for NUL bytes in files when deciding about is-binary inside diff (although it uses a better algorithm in checkout, when deciding if CRLF conversion should be done). Libgit2 was using the better algorithm in both places, but that is causing some confusion. For now, this makes diff just look for NUL bytes to decide if a file is binary by content in diff.
359fc2d2	2013-01-08T17:07:25	update copyrights
9ff07c24	2012-11-30T15:17:05	buf test: make sure we always set the bom variable
7bf87ab6	2012-11-28T09:58:48	Consolidate text buffer functions There are many scattered functions that look into the contents of buffers to do various text manipulations (such as escaping or unescaping data, calculating text stats, guessing if content is binary, etc). This groups all those functions together into a new file and converts the code to use that. This has two enhancements to existing functionality. The old text stats function is significantly rewritten and the BOM detection code was extended (although largely we can't deal with anything other than a UTF8 BOM).

5e5848eb

2013-02-14T17:25:10

Change similarity metric to sampled hashes This moves the similarity metric code out of buf_text and into a new file. Also, this implements a different approach to similarity measurement based on a Rabin-Karp rolling hash where we only keep the top 100 and bottom 100 hashes. In theory, that should be sufficient samples to given a fairly accurate measurement while limiting the amount of data we keep for file signatures no matter how large the file is.

f3327cac

2013-01-13T10:06:09

Some similarity metric adjustments This makes the text similarity metric treat \r as equivalent to \n and makes it skip whitespace immediately following a line terminator, so line indentation will have less effect on the difference measurement (and so \r\n will be treated as just a single line terminator). This also separates the text and binary hash calculators into two separate functions instead of have more if statements inside the loop. This should make it easier to have more differentiated heuristics in the future if we so wish.

9c454b00

2013-01-11T22:13:02

Initial implementation of similarity scoring algo This adds a new `git_buf_text_hashsig` type and functions to generate these hash signatures and compare them to give a similarity score. This can be plugged into diff similarity scoring.

355dddbf

2013-01-12T01:40:35

buf: Is this the function you were looking for?

0d65acad

2013-01-11T11:24:26

Match binary file check of core git in diff Core git just looks for NUL bytes in files when deciding about is-binary inside diff (although it uses a better algorithm in checkout, when deciding if CRLF conversion should be done). Libgit2 was using the better algorithm in both places, but that is causing some confusion. For now, this makes diff just look for NUL bytes to decide if a file is binary by content in diff.

359fc2d2

2013-01-08T17:07:25

update copyrights

9ff07c24

2012-11-30T15:17:05

buf test: make sure we always set the bom variable

7bf87ab6

2012-11-28T09:58:48

Consolidate text buffer functions There are many scattered functions that look into the contents of buffers to do various text manipulations (such as escaping or unescaping data, calculating text stats, guessing if content is binary, etc). This groups all those functions together into a new file and converts the code to use that. This has two enhancements to existing functionality. The old text stats function is significantly rewritten and the BOM detection code was extended (although largely we can't deal with anything other than a UTF8 BOM).

thodg/libgit2/src/buf_text.c

src/buf_text.c

Log