|
ba4faf6e
|
2018-02-08T17:15:33
|
|
buf_text: remove `offset` parameter of BOM detection function
The function to detect a BOM takes an offset where it shall look for a
BOM. No caller uses that, and searching for the BOM in the middle of a
buffer seems to be very unlikely, as a BOM should only ever exist at
file start.
Remove the parameter, as it has already caused confusion due to its
weirdness.
|
|
8293c8f9
|
2015-06-08T13:51:28
|
|
git_buf_text_lf_to_crlf: allow mixed line endings
Allow files to have mixed line endings instead of skipping processing
on them.
|
|
f1453c59
|
2015-02-12T12:19:37
|
|
Make our overflow check look more like gcc/clang's
Make our overflow checking look more like gcc and clang's, so that
we can substitute it out with the compiler instrinsics on platforms
that support it. This means dropping the ability to pass `NULL` as
an out parameter.
As a result, the macros also get updated to reflect this as well.
|
|
4aa664ae
|
2015-02-10T23:55:07
|
|
git_buf_grow_by: increase buf asize incrementally
Introduce `git_buf_grow_by` to incrementally increase the size of a
`git_buf`, performing an overflow calculation on the growth.
|
|
392702ee
|
2015-02-09T23:41:13
|
|
allocations: test for overflow of requested size
Introduce some helper macros to test integer overflow from arithmetic
and set error message appropriately.
|
|
0161e096
|
2014-11-13T19:30:47
|
|
Make binary detection work similar to vanilla git
Main change: Don't treat chars > 128 as non-printable (common in UTF-8 files)
Signed-off-by: Sven Strickroth <email@cs-ware.de>
|
|
24cce239
|
2014-11-21T18:09:57
|
|
text: Null-terminate a string if we've been gouging it
|
|
b3af2d80
|
2014-07-16T13:34:25
|
|
Just put it all in buffer.
|
|
df4cba0f
|
2014-07-15T17:27:58
|
|
Export git_buf_text_is_binary and git_buf_text_contains_nul.
So that users don’t need to implement binary detection themselves.
|
|
5a76ad35
|
2014-06-19T11:45:46
|
|
crlf: pass-through mixed EOL buffers from LF->CRLF
When checking out files, we're performing conversion into the user's
native line endings, but we only want to do it for files which have
consistent line endings. Refuse to perform the conversion for mixed-EOL
files.
The CRLF->LF filter is left as-is, as that conversion is considered to be
normalization by git and should force a conversion of the line endings.
|
|
e7d0ced2
|
2013-09-11T12:38:06
|
|
Fix longstanding valgrind warning
There was a possible circumstance that could result in reading
past the end of a buffer. This check fixes that.
|
|
0cf77103
|
2013-08-26T23:17:07
|
|
Start of filter API + git_blob_filtered_content
This begins the process of exposing git_filter objects to the
public API. This includes:
* new public type and API for `git_buffer` through which an
allocated buffer can be passed to the user
* new API `git_blob_filtered_content`
* make the git_filter type and GIT_FILTER_TO_... constants public
|
|
c0b01b75
|
2013-08-19T18:46:26
|
|
Skip UTF-8 BOM in binary detection
When a git_buf contains a UTF-8 BOM, the three bytes comprising
that BOM are treated as unprintable characters. For a small git_buf,
the three BOM characters overwhelm the printable characters. This
is problematic when trying to check out a small file as the CR/LF
filtering will not apply.
|
|
b74d4478
|
2013-07-15T07:41:39
|
|
Fix the initial line
|
|
6550565a
|
2013-07-13T03:02:00
|
|
Fix gather_stats
|
|
3658e81e
|
2013-03-25T14:20:07
|
|
Move crlf conversion into buf_text
This adds crlf/lf conversion functions into buf_text with more
efficient implementations that bypass the high level buffer
functions. They attempt to minimize the number of reallocations
done and they directly write the buffer data as needed if they
know that there is enough memory allocated to memcpy data.
Tests are added for these new functions. The crlf.c code is
updated to use the new functions.
Removed the include of buf_text.h from filter.h and just include
it more narrowly in the places that need it.
|
|
5e5848eb
|
2013-02-14T17:25:10
|
|
Change similarity metric to sampled hashes
This moves the similarity metric code out of buf_text and into a
new file. Also, this implements a different approach to similarity
measurement based on a Rabin-Karp rolling hash where we only keep
the top 100 and bottom 100 hashes. In theory, that should be
sufficient samples to given a fairly accurate measurement while
limiting the amount of data we keep for file signatures no matter
how large the file is.
|
|
f3327cac
|
2013-01-13T10:06:09
|
|
Some similarity metric adjustments
This makes the text similarity metric treat \r as equivalent
to \n and makes it skip whitespace immediately following a line
terminator, so line indentation will have less effect on the
difference measurement (and so \r\n will be treated as just a
single line terminator).
This also separates the text and binary hash calculators into
two separate functions instead of have more if statements inside
the loop. This should make it easier to have more differentiated
heuristics in the future if we so wish.
|
|
9c454b00
|
2013-01-11T22:13:02
|
|
Initial implementation of similarity scoring algo
This adds a new `git_buf_text_hashsig` type and functions to
generate these hash signatures and compare them to give a
similarity score. This can be plugged into diff similarity
scoring.
|
|
355dddbf
|
2013-01-12T01:40:35
|
|
buf: Is this the function you were looking for?
|
|
0d65acad
|
2013-01-11T11:24:26
|
|
Match binary file check of core git in diff
Core git just looks for NUL bytes in files when deciding about
is-binary inside diff (although it uses a better algorithm in
checkout, when deciding if CRLF conversion should be done).
Libgit2 was using the better algorithm in both places, but that
is causing some confusion. For now, this makes diff just look
for NUL bytes to decide if a file is binary by content in diff.
|
|
359fc2d2
|
2013-01-08T17:07:25
|
|
update copyrights
|
|
9ff07c24
|
2012-11-30T15:17:05
|
|
buf test: make sure we always set the bom variable
|
|
7bf87ab6
|
2012-11-28T09:58:48
|
|
Consolidate text buffer functions
There are many scattered functions that look into the contents of
buffers to do various text manipulations (such as escaping or
unescaping data, calculating text stats, guessing if content is
binary, etc). This groups all those functions together into a
new file and converts the code to use that.
This has two enhancements to existing functionality. The old
text stats function is significantly rewritten and the BOM
detection code was extended (although largely we can't deal with
anything other than a UTF8 BOM).
|