kmx git

Commit	Date	Message
036be150	2021-08-30T08:47:04	hashsig: close fd on error
52505ab5	2021-07-07T19:12:02	Convert long long constant specifiers to stdint macros
01c64945	2020-04-05T16:43:55	hashsig: use GIT_ASSERT
e54343a4	2019-06-29T09:17:32	fileops: rename to "futils.h" to match function signatures Our file utils functions all have a "futils" prefix, e.g. `git_futils_touch`. One would thus naturally guess that their definitions and implementation would live in files "futils.h" and "futils.c", respectively, but in fact they live in "fileops.h". Rename the files to match expectations.
f673e232	2018-12-27T13:47:34	git_error: use new names in internal APIs and usage Move to the `git_error` name in the internal API for error-related functions.
0c7f49dd	2017-06-30T13:39:01	Make sure to always include "common.h" first Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.
909d5494	2016-12-29T12:25:15	giterr_set: consistent error messages Error messages should be sentence fragments, and therefore: 1. Should not begin with a capital letter, 2. Should not conclude with punctuation, and 3. Should not end a sentence and begin a new one
f78d9b6c	2015-03-03T23:56:54	diff_tform: account for whitespace options When comparing seemingly blank files, take whitespace options into account.
a212716f	2015-03-03T18:19:42	diff_tform: don't compare empty hashsig_heaps Don't try to compare two empty hashsig_heaps.
36fc5497	2014-12-02T05:11:12	Added GIT_HASHSIG_ALLOW_SMALL_FILES to allow computing signatures for small files The implementation of the hashsig API disallows computing a signature on small files containing only a few lines. This new flag disables this behavior. git_diff_find_similar() sets this flag by default which means that rename / copy detection of small files will now work. This in turn affects the behavior of the git_status and git_blame APIs which will now detect rename of small files assuming the right options are passed.
737b5051	2014-10-01T12:03:24	hashsig: Export as a `sys` header
d730d3f4	2013-07-31T16:40:42	Major rename detection changes After doing further profiling, I found that a lot of time was being spent attempting to insert hashes into the file hash signature when using the rolling hash because the rolling hash approach generates a hash per byte of the file instead of one per run/line of data. To optimize this, I decided to convert back to a run-based file signature algorithm which would be more like core Git. After changing this, a number of the existing tests started to fail. In some cases, this appears to have been because the test was coded to be too specific to the particular results of the file similarity metric and in some cases there appear to have been bugs in the core rename detection code where only by the coincidence of the file similarity scoring were the expected results being generated. This renames all the variables in the core rename detection code to be more consistent and hopefully easier to follow which made it a bit easier to reason about the behavior of that code and fix the problems that I was seeing. I think it's in better shape now. There are a couple of tests now that attempt to stress test the rename detection code and they are quite slow. Most of the time is spent setting up the test data on disk and in the index. When we roll out performance improvements for index insertion, it should also speed up these tests I hope.
427cc255	2013-07-24T13:11:11	Use local variables in hash calc to avoid aliasing
1fed6b07	2013-05-13T21:57:37	Fix trailing whitespaces
e40f1c2d	2013-03-08T16:39:57	Make tree iterator handle icase equivalence There is a serious bug in the previous tree iterator implementation. If case insensitivity resulted in member elements being equivalent to one another, and those member elements were trees, then the children of the colliding elements would be processed in sequence instead of in a single flattened list. This meant that the tree iterator was not truly acting like a case-insensitive list. This completely reworks the tree iterator to manage lists with case insensitive equivalence classes and advance through the items in a unified manner in a single sorted frame. It is possible that at a future date we might want to update this to separate the case insensitive and case sensitive tree iterators so that the case sensitive one could be a minimal amount of code and the insensitive one would always know what it needed to do without checking flags. But there would be so much shared code between the two, that I'm not sure it that's a win. For now, this gets what we need. More tests are needed, though.
97b71374	2013-02-28T14:14:45	Add GIT_STDLIB_CALL This removes the one-off GIT_CDECL and adds a new standard way of doing this named GIT_STDLIB_CALL with a src/win32 specific def when on the Windows platform.
f708c89f	2013-02-27T15:15:39	fixing some warnings on Windows
11b5beb7	2013-02-27T15:07:28	use cdecl for hashsig sorting functions on Windows
9bc8be3d	2013-02-19T10:25:41	Refine pluggable similarity API This plugs in the three basic similarity strategies for handling whitespace via internal use of the pluggable API. In so doing, I realized that the use of git_buf in the hashsig API was not needed and actually just made it harder to use, so I tweaked that API as well. Note that the similarity metric is still not hooked up in the find_similarity code - this is just setting out the function that will be used.
5e5848eb	2013-02-14T17:25:10	Change similarity metric to sampled hashes This moves the similarity metric code out of buf_text and into a new file. Also, this implements a different approach to similarity measurement based on a Rabin-Karp rolling hash where we only keep the top 100 and bottom 100 hashes. In theory, that should be sufficient samples to given a fairly accurate measurement while limiting the amount of data we keep for file signatures no matter how large the file is.

036be150

2021-08-30T08:47:04

hashsig: close fd on error

52505ab5

2021-07-07T19:12:02

Convert long long constant specifiers to stdint macros

01c64945

2020-04-05T16:43:55

hashsig: use GIT_ASSERT

e54343a4

2019-06-29T09:17:32

fileops: rename to "futils.h" to match function signatures Our file utils functions all have a "futils" prefix, e.g. `git_futils_touch`. One would thus naturally guess that their definitions and implementation would live in files "futils.h" and "futils.c", respectively, but in fact they live in "fileops.h". Rename the files to match expectations.

f673e232

2018-12-27T13:47:34

git_error: use new names in internal APIs and usage Move to the `git_error` name in the internal API for error-related functions.

0c7f49dd

2017-06-30T13:39:01

Make sure to always include "common.h" first Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.

909d5494

2016-12-29T12:25:15

giterr_set: consistent error messages Error messages should be sentence fragments, and therefore: 1. Should not begin with a capital letter, 2. Should not conclude with punctuation, and 3. Should not end a sentence and begin a new one

f78d9b6c

2015-03-03T23:56:54

diff_tform: account for whitespace options When comparing seemingly blank files, take whitespace options into account.

a212716f

2015-03-03T18:19:42

diff_tform: don't compare empty hashsig_heaps Don't try to compare two empty hashsig_heaps.

36fc5497

2014-12-02T05:11:12

Added GIT_HASHSIG_ALLOW_SMALL_FILES to allow computing signatures for small files The implementation of the hashsig API disallows computing a signature on small files containing only a few lines. This new flag disables this behavior. git_diff_find_similar() sets this flag by default which means that rename / copy detection of small files will now work. This in turn affects the behavior of the git_status and git_blame APIs which will now detect rename of small files assuming the right options are passed.

737b5051

2014-10-01T12:03:24

hashsig: Export as a `sys` header

d730d3f4

2013-07-31T16:40:42

Major rename detection changes After doing further profiling, I found that a lot of time was being spent attempting to insert hashes into the file hash signature when using the rolling hash because the rolling hash approach generates a hash per byte of the file instead of one per run/line of data. To optimize this, I decided to convert back to a run-based file signature algorithm which would be more like core Git. After changing this, a number of the existing tests started to fail. In some cases, this appears to have been because the test was coded to be too specific to the particular results of the file similarity metric and in some cases there appear to have been bugs in the core rename detection code where only by the coincidence of the file similarity scoring were the expected results being generated. This renames all the variables in the core rename detection code to be more consistent and hopefully easier to follow which made it a bit easier to reason about the behavior of that code and fix the problems that I was seeing. I think it's in better shape now. There are a couple of tests now that attempt to stress test the rename detection code and they are quite slow. Most of the time is spent setting up the test data on disk and in the index. When we roll out performance improvements for index insertion, it should also speed up these tests I hope.

427cc255

2013-07-24T13:11:11

Use local variables in hash calc to avoid aliasing

1fed6b07

2013-05-13T21:57:37

Fix trailing whitespaces

e40f1c2d

2013-03-08T16:39:57

Make tree iterator handle icase equivalence There is a serious bug in the previous tree iterator implementation. If case insensitivity resulted in member elements being equivalent to one another, and those member elements were trees, then the children of the colliding elements would be processed in sequence instead of in a single flattened list. This meant that the tree iterator was not truly acting like a case-insensitive list. This completely reworks the tree iterator to manage lists with case insensitive equivalence classes and advance through the items in a unified manner in a single sorted frame. It is possible that at a future date we might want to update this to separate the case insensitive and case sensitive tree iterators so that the case sensitive one could be a minimal amount of code and the insensitive one would always know what it needed to do without checking flags. But there would be so much shared code between the two, that I'm not sure it that's a win. For now, this gets what we need. More tests are needed, though.

97b71374

2013-02-28T14:14:45

Add GIT_STDLIB_CALL This removes the one-off GIT_CDECL and adds a new standard way of doing this named GIT_STDLIB_CALL with a src/win32 specific def when on the Windows platform.

f708c89f

2013-02-27T15:15:39

fixing some warnings on Windows

11b5beb7

2013-02-27T15:07:28

use cdecl for hashsig sorting functions on Windows

9bc8be3d

2013-02-19T10:25:41

Refine pluggable similarity API This plugs in the three basic similarity strategies for handling whitespace via internal use of the pluggable API. In so doing, I realized that the use of git_buf in the hashsig API was not needed and actually just made it harder to use, so I tweaked that API as well. Note that the similarity metric is still not hooked up in the find_similarity code - this is just setting out the function that will be used.

5e5848eb

2013-02-14T17:25:10

Change similarity metric to sampled hashes This moves the similarity metric code out of buf_text and into a new file. Also, this implements a different approach to similarity measurement based on a Rabin-Karp rolling hash where we only keep the top 100 and bottom 100 hashes. In theory, that should be sufficient samples to given a fairly accurate measurement while limiting the amount of data we keep for file signatures no matter how large the file is.

thodg/libgit2/src/hashsig.c

src/hashsig.c

Log