kmx git

Commit	Date	Message
ba4faf6e	2018-02-08T17:15:33	buf_text: remove `offset` parameter of BOM detection function The function to detect a BOM takes an offset where it shall look for a BOM. No caller uses that, and searching for the BOM in the middle of a buffer seems to be very unlikely, as a BOM should only ever exist at file start. Remove the parameter, as it has already caused confusion due to its weirdness.
8293c8f9	2015-06-08T13:51:28	git_buf_text_lf_to_crlf: allow mixed line endings Allow files to have mixed line endings instead of skipping processing on them.
f1453c59	2015-02-12T12:19:37	Make our overflow check look more like gcc/clang's Make our overflow checking look more like gcc and clang's, so that we can substitute it out with the compiler instrinsics on platforms that support it. This means dropping the ability to pass `NULL` as an out parameter. As a result, the macros also get updated to reflect this as well.
4aa664ae	2015-02-10T23:55:07	git_buf_grow_by: increase buf asize incrementally Introduce `git_buf_grow_by` to incrementally increase the size of a `git_buf`, performing an overflow calculation on the growth.
392702ee	2015-02-09T23:41:13	allocations: test for overflow of requested size Introduce some helper macros to test integer overflow from arithmetic and set error message appropriately.
0161e096	2014-11-13T19:30:47	Make binary detection work similar to vanilla git Main change: Don't treat chars > 128 as non-printable (common in UTF-8 files) Signed-off-by: Sven Strickroth <email@cs-ware.de>
24cce239	2014-11-21T18:09:57	text: Null-terminate a string if we've been gouging it
b3af2d80	2014-07-16T13:34:25	Just put it all in buffer.
df4cba0f	2014-07-15T17:27:58	Export git_buf_text_is_binary and git_buf_text_contains_nul. So that users don’t need to implement binary detection themselves.
5a76ad35	2014-06-19T11:45:46	crlf: pass-through mixed EOL buffers from LF->CRLF When checking out files, we're performing conversion into the user's native line endings, but we only want to do it for files which have consistent line endings. Refuse to perform the conversion for mixed-EOL files. The CRLF->LF filter is left as-is, as that conversion is considered to be normalization by git and should force a conversion of the line endings.
e7d0ced2	2013-09-11T12:38:06	Fix longstanding valgrind warning There was a possible circumstance that could result in reading past the end of a buffer. This check fixes that.
0cf77103	2013-08-26T23:17:07	Start of filter API + git_blob_filtered_content This begins the process of exposing git_filter objects to the public API. This includes: * new public type and API for `git_buffer` through which an allocated buffer can be passed to the user * new API `git_blob_filtered_content` * make the git_filter type and GIT_FILTER_TO_... constants public
c0b01b75	2013-08-19T18:46:26	Skip UTF-8 BOM in binary detection When a git_buf contains a UTF-8 BOM, the three bytes comprising that BOM are treated as unprintable characters. For a small git_buf, the three BOM characters overwhelm the printable characters. This is problematic when trying to check out a small file as the CR/LF filtering will not apply.
b74d4478	2013-07-15T07:41:39	Fix the initial line
6550565a	2013-07-13T03:02:00	Fix gather_stats
3658e81e	2013-03-25T14:20:07	Move crlf conversion into buf_text This adds crlf/lf conversion functions into buf_text with more efficient implementations that bypass the high level buffer functions. They attempt to minimize the number of reallocations done and they directly write the buffer data as needed if they know that there is enough memory allocated to memcpy data. Tests are added for these new functions. The crlf.c code is updated to use the new functions. Removed the include of buf_text.h from filter.h and just include it more narrowly in the places that need it.
5e5848eb	2013-02-14T17:25:10	Change similarity metric to sampled hashes This moves the similarity metric code out of buf_text and into a new file. Also, this implements a different approach to similarity measurement based on a Rabin-Karp rolling hash where we only keep the top 100 and bottom 100 hashes. In theory, that should be sufficient samples to given a fairly accurate measurement while limiting the amount of data we keep for file signatures no matter how large the file is.
f3327cac	2013-01-13T10:06:09	Some similarity metric adjustments This makes the text similarity metric treat \r as equivalent to \n and makes it skip whitespace immediately following a line terminator, so line indentation will have less effect on the difference measurement (and so \r\n will be treated as just a single line terminator). This also separates the text and binary hash calculators into two separate functions instead of have more if statements inside the loop. This should make it easier to have more differentiated heuristics in the future if we so wish.
9c454b00	2013-01-11T22:13:02	Initial implementation of similarity scoring algo This adds a new `git_buf_text_hashsig` type and functions to generate these hash signatures and compare them to give a similarity score. This can be plugged into diff similarity scoring.
355dddbf	2013-01-12T01:40:35	buf: Is this the function you were looking for?
0d65acad	2013-01-11T11:24:26	Match binary file check of core git in diff Core git just looks for NUL bytes in files when deciding about is-binary inside diff (although it uses a better algorithm in checkout, when deciding if CRLF conversion should be done). Libgit2 was using the better algorithm in both places, but that is causing some confusion. For now, this makes diff just look for NUL bytes to decide if a file is binary by content in diff.
359fc2d2	2013-01-08T17:07:25	update copyrights
9ff07c24	2012-11-30T15:17:05	buf test: make sure we always set the bom variable
7bf87ab6	2012-11-28T09:58:48	Consolidate text buffer functions There are many scattered functions that look into the contents of buffers to do various text manipulations (such as escaping or unescaping data, calculating text stats, guessing if content is binary, etc). This groups all those functions together into a new file and converts the code to use that. This has two enhancements to existing functionality. The old text stats function is significantly rewritten and the BOM detection code was extended (although largely we can't deal with anything other than a UTF8 BOM).

ba4faf6e

2018-02-08T17:15:33

buf_text: remove `offset` parameter of BOM detection function The function to detect a BOM takes an offset where it shall look for a BOM. No caller uses that, and searching for the BOM in the middle of a buffer seems to be very unlikely, as a BOM should only ever exist at file start. Remove the parameter, as it has already caused confusion due to its weirdness.

8293c8f9

2015-06-08T13:51:28

git_buf_text_lf_to_crlf: allow mixed line endings Allow files to have mixed line endings instead of skipping processing on them.

f1453c59

2015-02-12T12:19:37

Make our overflow check look more like gcc/clang's Make our overflow checking look more like gcc and clang's, so that we can substitute it out with the compiler instrinsics on platforms that support it. This means dropping the ability to pass `NULL` as an out parameter. As a result, the macros also get updated to reflect this as well.

4aa664ae

2015-02-10T23:55:07

git_buf_grow_by: increase buf asize incrementally Introduce `git_buf_grow_by` to incrementally increase the size of a `git_buf`, performing an overflow calculation on the growth.

392702ee

2015-02-09T23:41:13

allocations: test for overflow of requested size Introduce some helper macros to test integer overflow from arithmetic and set error message appropriately.

0161e096

2014-11-13T19:30:47

Make binary detection work similar to vanilla git Main change: Don't treat chars > 128 as non-printable (common in UTF-8 files) Signed-off-by: Sven Strickroth <email@cs-ware.de>

24cce239

2014-11-21T18:09:57

text: Null-terminate a string if we've been gouging it

b3af2d80

2014-07-16T13:34:25

Just put it all in buffer.

df4cba0f

2014-07-15T17:27:58

Export git_buf_text_is_binary and git_buf_text_contains_nul. So that users don’t need to implement binary detection themselves.

5a76ad35

2014-06-19T11:45:46

crlf: pass-through mixed EOL buffers from LF->CRLF When checking out files, we're performing conversion into the user's native line endings, but we only want to do it for files which have consistent line endings. Refuse to perform the conversion for mixed-EOL files. The CRLF->LF filter is left as-is, as that conversion is considered to be normalization by git and should force a conversion of the line endings.

e7d0ced2

2013-09-11T12:38:06

Fix longstanding valgrind warning There was a possible circumstance that could result in reading past the end of a buffer. This check fixes that.

0cf77103

2013-08-26T23:17:07

Start of filter API + git_blob_filtered_content This begins the process of exposing git_filter objects to the public API. This includes: * new public type and API for `git_buffer` through which an allocated buffer can be passed to the user * new API `git_blob_filtered_content` * make the git_filter type and GIT_FILTER_TO_... constants public

c0b01b75

2013-08-19T18:46:26

Skip UTF-8 BOM in binary detection When a git_buf contains a UTF-8 BOM, the three bytes comprising that BOM are treated as unprintable characters. For a small git_buf, the three BOM characters overwhelm the printable characters. This is problematic when trying to check out a small file as the CR/LF filtering will not apply.

b74d4478

2013-07-15T07:41:39

Fix the initial line

6550565a

2013-07-13T03:02:00

Fix gather_stats

3658e81e

2013-03-25T14:20:07

Move crlf conversion into buf_text This adds crlf/lf conversion functions into buf_text with more efficient implementations that bypass the high level buffer functions. They attempt to minimize the number of reallocations done and they directly write the buffer data as needed if they know that there is enough memory allocated to memcpy data. Tests are added for these new functions. The crlf.c code is updated to use the new functions. Removed the include of buf_text.h from filter.h and just include it more narrowly in the places that need it.

5e5848eb

2013-02-14T17:25:10

Change similarity metric to sampled hashes This moves the similarity metric code out of buf_text and into a new file. Also, this implements a different approach to similarity measurement based on a Rabin-Karp rolling hash where we only keep the top 100 and bottom 100 hashes. In theory, that should be sufficient samples to given a fairly accurate measurement while limiting the amount of data we keep for file signatures no matter how large the file is.

f3327cac

2013-01-13T10:06:09

Some similarity metric adjustments This makes the text similarity metric treat \r as equivalent to \n and makes it skip whitespace immediately following a line terminator, so line indentation will have less effect on the difference measurement (and so \r\n will be treated as just a single line terminator). This also separates the text and binary hash calculators into two separate functions instead of have more if statements inside the loop. This should make it easier to have more differentiated heuristics in the future if we so wish.

9c454b00

2013-01-11T22:13:02

Initial implementation of similarity scoring algo This adds a new `git_buf_text_hashsig` type and functions to generate these hash signatures and compare them to give a similarity score. This can be plugged into diff similarity scoring.

355dddbf

2013-01-12T01:40:35

buf: Is this the function you were looking for?

0d65acad

2013-01-11T11:24:26

Match binary file check of core git in diff Core git just looks for NUL bytes in files when deciding about is-binary inside diff (although it uses a better algorithm in checkout, when deciding if CRLF conversion should be done). Libgit2 was using the better algorithm in both places, but that is causing some confusion. For now, this makes diff just look for NUL bytes to decide if a file is binary by content in diff.

359fc2d2

2013-01-08T17:07:25

update copyrights

9ff07c24

2012-11-30T15:17:05

buf test: make sure we always set the bom variable

7bf87ab6

2012-11-28T09:58:48

Consolidate text buffer functions There are many scattered functions that look into the contents of buffers to do various text manipulations (such as escaping or unescaping data, calculating text stats, guessing if content is binary, etc). This groups all those functions together into a new file and converts the code to use that. This has two enhancements to existing functionality. The old text stats function is significantly rewritten and the BOM detection code was extended (although largely we can't deal with anything other than a UTF8 BOM).

thodg/libgit2/src/buf_text.c

src/buf_text.c

Log