src/filter.c


Log

Author Commit Date CI Message
Edward Thomson 31d9c24b 2021-05-06T16:32:14 filter: internal git_buf filter handling function Introduce `git_filter_list__convert_buf` which behaves like the old implementation of `git_filter_list__apply_data`, where it might move the input data buffer over into the output data buffer space for efficiency. This new implementation will do so in a more predictible way, always freeing the given input buffer (either moving it to the output buffer or filtering it into the output buffer first). Convert internal users to it.
Edward Thomson 68b9605a 2021-05-06T15:37:31 filter: deprecate git_filter_list_apply_to_data Deprecate `git_filter_list_apply_to_data` as it takes user input as a `git_buf`. Users should use `git_filter_list_apply_to_buffer` instead.
Edward Thomson 5309b465 2021-05-06T15:24:30 filter: introduce git_filter_list_apply_to_buffer Provide a filter application mechanism that takes a user-provided string and length, instead of a `git_buf`.
Edward Thomson 26846f4c 2021-05-06T15:19:58 filter: remove git_buf sharing in `git_filter_list_apply_to_data` The API `git_filter_list_apply_to_data` shares data between its out and in parameters to avoid unnecessarily copying it when there are no filters to apply. However, it does so in a manner that is potentially confusing, leaving both `git_buf`s populated with data. This is risky for end-users who have to know how to deal with this. Instead, we remove this optimization - users who want to avoid unnecessary copies can use the longstanding streaming API or check the filter status before invoking the filters.
Edward Thomson 9869f1e5 2021-05-06T02:19:49 filter: deprecate git_filter_list_stream_data `git_filter_list_stream_data` takes user input in a `git_buf`. `git_buf` should only be used when libgit2 itself needs to allocate data and returned to a user that they can free when they wish. Replace it with `git_filter_list_stream_buffer` that takes a data buffer and length.
Edward Thomson 4470e48a 2021-04-04T14:24:35 workdir: validate working directory entry path length
Edward Thomson bc54898f 2020-04-05T16:27:30 filter: use GIT_ASSERT
Edward Thomson cb4bfbc9 2020-04-05T11:07:54 buffer: git_buf_sanitize should return a value `git_buf_sanitize` is called with user-input, and wants to sanity-check that input. Allow it to return a value if the input was malformed in a way that we cannot cope.
Edward Thomson e316b0d3 2020-05-15T11:47:09 runtime: move init/shutdown into the "runtime" Provide a mechanism for system components to register for initialization and shutdown of the libgit2 runtime.
Patrick Steinhardt a6c9e0b3 2020-06-08T12:40:47 tree-wide: mark local functions as static We've accumulated quite some functions which are never used outside of their respective code unit, but which are lacking the `static` keyword. Add it to reduce their linkage scope and allow the compiler to optimize better.
Edward Thomson 4334b177 2019-06-23T15:43:38 blob: use `git_object_size_t` for object size Instead of using a signed type (`off_t`) use a new `git_object_size_t` for the sizes of objects.
Edward Thomson f0f27c1c 2019-07-21T14:13:25 filter: optionally read attributes from repository When `GIT_FILTER_ATTRIBUTES_FROM_HEAD` is specified, configure the filter to read filter attributes from `gitattributes` files that are checked in to the repository at the HEAD revision. This passes the flag `GIT_ATTR_CHECK_INCLUDE_HEAD` to the attribute reading functions.
Edward Thomson 22eb12af 2019-07-21T12:12:05 filter: add GIT_FILTER_NO_SYSTEM_ATTRIBUTES option Allow system-wide attributes (the ones specified in `/etc/gitattributes`) to be ignored if the flag `GIT_FILTER_NO_SYSTEM_ATTRIBUTES` is specified.
Patrick Steinhardt e54343a4 2019-06-29T09:17:32 fileops: rename to "futils.h" to match function signatures Our file utils functions all have a "futils" prefix, e.g. `git_futils_touch`. One would thus naturally guess that their definitions and implementation would live in files "futils.h" and "futils.c", respectively, but in fact they live in "fileops.h". Rename the files to match expectations.
Edward Thomson 91a300b7 2019-06-16T00:46:30 attr: rename constants and macros for consistency Our enumeration values are not generally suffixed with `T`. Further, our enumeration names are generally more descriptive.
Edward Thomson 5d92e547 2019-06-08T17:28:35 oid: `is_zero` instead of `iszero` The only function that is named `issomething` (without underscore) was `git_oid_iszero`. Rename it to `git_oid_is_zero` for consistency with the rest of the library.
Edward Thomson fac08837 2019-01-21T11:38:46 filter: return an int Validate that the return value of the read is not less than INT_MAX, then cast.
Edward Thomson f673e232 2018-12-27T13:47:34 git_error: use new names in internal APIs and usage Move to the `git_error` name in the internal API for error-related functions.
Anders Borum f4835e44 2018-12-04T21:48:12 make proxy_stream_close close target stream even on errors When git_filter_apply_fn callback returns a error while smudging proxy_stream_close ends up returning without closing the stream. This is turn makes blob_content_to_file crash as it asserts the stream being closed whether there are errors or not. Closing the target stream on error fixes this problem.
Patrick Steinhardt ecf4f33a 2018-02-08T11:14:48 Convert usage of `git_buf_free` to new `git_buf_dispose`
Patrick Steinhardt 0c7f49dd 2017-06-30T13:39:01 Make sure to always include "common.h" first Next to including several files, our "common.h" header also declares various macros which are then used throughout the project. As such, we have to make sure to always include this file first in all implementation files. Otherwise, we might encounter problems or even silent behavioural differences due to macros or defines not being defined as they should be. So in fact, our header and implementation files should make sure to always include "common.h" first. This commit does so by establishing a common include pattern. Header files inside of "src" will now always include "common.h" as its first other file, separated by a newline from all the other includes to make it stand out as special. There are two cases for the implementation files. If they do have a matching header file, they will always include this one first, leading to "common.h" being transitively included as first file. If they do not have a matching header file, they instead include "common.h" as first file themselves. This fixes the outlined problems and will become our standard practice for header and source files inside of the "src/" from now on.
Mohseen Mukaddam a78441bc 2017-06-13T11:05:40 Adding git_filter_init for initializing `git_filter` struct + unit test
Patrick Steinhardt cf07db2f 2017-04-07T16:05:10 filter: only close filter if it's been initialized correctly In the function `git_filter_list_stream_data`, we initialize, write and subesquently close the stream which should receive content processed by the filter. While we skip writing to the stream if its initialization failed, we still try to close it unconditionally -- even if the initialization failed, where the stream might not be set at all, leading us to segfault. Semantics in this code is not really clear. The function handling the same logic for files instead of data seems to do the right thing here in only closing the stream when initialization succeeded. When stepping back a bit, this is only reasonable: if a stream cannot be initialized, the caller would not expect it to be closed again. So actually, both callers of `stream_list_init` fail to do so. The data streaming function will always close the stream and the file streaming function will not close the stream if writing to it has failed. The fix is thus two-fold: - callers of `stream_list_init` now close the stream iff it has been initialized - `stream_list_init` now closes the lastly initialized stream if the current stream in the chain failed to initialize Add a test which segfaulted previous to these changes.
Edward Thomson 909d5494 2016-12-29T12:25:15 giterr_set: consistent error messages Error messages should be sentence fragments, and therefore: 1. Should not begin with a capital letter, 2. Should not conclude with punctuation, and 3. Should not end a sentence and begin a new one
Edward Thomson 2ed855a9 2016-02-07T13:16:30 filter: avoid races during filter registration Previously we would set the global filter registry structure before adding filters to the structure, without a lock, which is quite racy. Now, register default filters during global registration and use an rwlock to read and write the filter registry (as appopriate).
Carlos Martín Nieto 4de7f3bf 2015-07-12T13:28:03 filter: make sure to close the stream even on error When the stream list init or write fail, we must also make sure to close the stream, as that's the function contract.
Edward Thomson 63924435 2015-07-01T09:40:11 filters: custom filters with wildcard attributes Allow custom filters with wildcard attributes, so that clients can support some random `filter=foo` in a .gitattributes and look up the corresponding smudge/clean commands in the configuration file.
Edward Thomson 2eecc288 2015-06-10T14:43:49 Introduce `git_filter_list_contains` `git_filter_list_contains` can be used to query a filter list to determine if a given filter will be run.
Carlos Martín Nieto 0137aba5 2015-06-10T11:08:05 filter: close the descriptor in case of error When we hit an error writing to the next stream from a file, we jump to 'done' which currently skips over closing the file descriptor. Make sure to close the descriptor if it has been set to a valid value.
J Wyman 7dd22538 2015-05-11T10:19:25 centralizing all IO buffer size values
Carlos Martín Nieto 9cdd6578 2015-05-09T13:11:46 Merge pull request #3104 from whoisj/optimal-buffer-size Adjusting stream buffer size to 64KB
J Wyman 7eb7e03d 2015-05-07T08:50:12 Adjusting stream buffer size to 64KB 64K is optimal buffer size per https://technet.microsoft.com/en-us/library/cc938632.aspx
Leo Yang 69f0032b 2015-04-28T12:40:20 Fix some build warnings In checkout.c and filter.c we were casting a sub struct to a parent struct which breaks the strict aliasing rules in C. However we can use .parent or .base to access the parent struct to avoid the build warnings. In remote.c the local variable error was not initialized or updated in some cases. For unintialized error a build warning will be generated. So always keep error variable up-to-date.
Edward Thomson 669ae274 2015-03-23T13:12:55 filter: clear the temp_buf if we're using one If we are using a temporary buffer for filtering, be sure to clear it before using it, in case the file that we are filtering is empty.
Edward Thomson 6a2edc5a 2015-03-06T15:16:40 filter: accept relative paths in apply_to_file
Edward Thomson 9a823bad 2015-03-06T14:37:34 filter: drop old TODO
Edward Thomson 9c9aa1ba 2015-02-19T11:32:55 filter: take `temp_buf` in `git_filter_options`
Edward Thomson d05218b0 2015-02-19T11:25:26 filter: add `git_filter_list__load_ext` Refactor `git_filter_list__load_with_attr_reader` into `git_filter_list__load_ext`, which takes a `git_filter_options`.
Edward Thomson 795eaccd 2015-02-19T11:09:54 git_filter_opt_t -> git_filter_flag_t For consistency with the rest of the library, where an opt is an options *structure*.
Edward Thomson d4cf1675 2015-02-19T10:05:33 buffer: introduce git_buf_attach_notowned Provide a convenience function that creates a buffer that can be provided to callers but will not be freed via `git_buf_free`, so the buffer creator maintains the allocation lifecycle of the buffer's contents.
Edward Thomson f7c0125f 2015-02-18T09:28:07 filter streams: base -> parent
Edward Thomson b75f15aa 2015-02-18T09:25:32 git_writestream: from git_filter_stream
Edward Thomson 646364e7 2015-02-17T20:25:31 checkout: maintain temporary buffer for filters Let the filters use the checkout data's temporary buffer, instead of having to allocate new buffers each time.
Edward Thomson e78f5c9f 2015-01-22T16:11:36 checkout: stream the blob into the filters Use the new streaming filter API during checkout.
Edward Thomson 5555696f 2015-01-22T16:11:22 filters: stream internally Migrate the `git_filter_list_apply_*` functions over to using the new filter streams.
Edward Thomson fbdc9db3 2015-01-22T16:10:06 filters: introduce streaming filters Add structures and preliminary functions to take a buffer, file or blob and write the contents in chunks through an arbitrary number of chained filters, finally writing into a user-provided function accept the contents.
Edward Thomson f1453c59 2015-02-12T12:19:37 Make our overflow check look more like gcc/clang's Make our overflow checking look more like gcc and clang's, so that we can substitute it out with the compiler instrinsics on platforms that support it. This means dropping the ability to pass `NULL` as an out parameter. As a result, the macros also get updated to reflect this as well.
Edward Thomson 392702ee 2015-02-09T23:41:13 allocations: test for overflow of requested size Introduce some helper macros to test integer overflow from arithmetic and set error message appropriately.
Edward Thomson 9f779aac 2015-01-29T14:40:55 attrcache: don't re-read attrs during checkout During checkout, assume that the .gitattributes files aren't modified during the checkout. Instead, create an "attribute session" during checkout. Assume that attribute data read in the same checkout "session" hasn't been modified since the checkout started. (But allow subsequent checkouts to invalidate the cache.) Further, cache nonexistent git_attr_file data even when .gitattributes files are not found to prevent re-scanning for nonexistent files.
Anurag Gupta (OSG) 5623e627 2014-10-09T11:44:05 git_filter: dup the filter name
Russell Belfer af567e88 2014-05-12T10:44:13 Merge pull request #2334 from libgit2/rb/fix-2333 Be more careful with user-supplied buffers
Russell Belfer 45c53eb6 2014-05-08T10:46:04 Use unsigned type for APIs with opt flag mask
Russell Belfer 1e4976cb 2014-05-08T10:17:14 Be more careful with user-supplied buffers This adds in missing calls to `git_buf_sanitize` and fixes a number of places where `git_buf` APIs could inadvertently write NUL terminator bytes into invalid buffers. This also changes the behavior of `git_buf_sanitize` to NUL terminate a buffer if it can and of `git_buf_shorten` to do nothing if it can. Adds tests of filtering code with zeroed (i.e. unsanitized) buffer which was previously triggering a segfault.
Russell Belfer 5269008c 2014-05-06T16:01:49 Add filter options and ALLOW_UNSAFE Diff and status do not want core.safecrlf to actually raise an error regardless of the setting, so this extends the filter API with an additional options flags parameter and adds a flag so that filters can be applied with GIT_FILTER_OPT_ALLOW_UNSAFE, indicating that unsafe filter application should be downgraded from a failure to a warning.
Jiri Pospisil 424222f4 2014-04-25T15:49:26 Filter: Make sure to release local on error
Russell Belfer 9cfce273 2013-12-12T12:11:38 Cleanups, renames, and leak fixes This renames git_vector_free_all to the better git_vector_free_deep and also contains a couple of memory leak fixes based on valgrind checks. The fixes are specifically: failure to free global dir path variables when not compiled with threading on and failure to free filters from the filter registry that had not be initialized fully.
Russell Belfer 71379313 2013-09-23T13:40:23 Fix warnings on Windows 64-bit build
Russell Belfer eefc32d5 2013-09-16T12:54:40 Bug fixes and cleanups This contains a few bug fixes and some header and API cleanups. The main API change is that filters should now use GIT_PASSTHROUGH to indicate that they wish to skip processing a file instead of GIT_ENOTFOUND. The bug fixes include a possible out-of-range buffer access in the ident filter, a filter ordering problem I introduced into the custom filter tests on Windows, and a filter buf NUL termination issue that was coming up on Linux.
Russell Belfer e399c7ee 2013-09-13T09:50:05 Fix win32 warnings I wish MSVC understood that "const char **" is not a const ptr, but it a non-const pointer to an array of const ptrs. Does that seem like too much to ask.
Russell Belfer 37f9e409 2013-09-13T21:43:00 Some tests with ident and crlf filters Fixed the filter order to match core Git, too. This test demonstrates an interesting behavior of core Git (which is totally reasonable and which libgit2 matches, although mostly by coincidence). If you use the ident filter and commit a file with a garbage ident in it, like '$Id: this is just garbage$' and then immediately do a 'git checkout-index' with the new file, Git will not consider the file out of date and will not overwrite the file with an updated $Id$. Libgit2 has the same behavior. If you remove the file and then do a checkout-index, it will be replaced with a filtered version that has injected the OID correctly.
Russell Belfer a9f51e43 2013-09-11T22:00:36 Merge git_buf and git_buffer This makes the git_buf struct that was used internally into an externally available structure and eliminates the git_buffer. As part of that, some of the special cases that arose with the externally used git_buffer were blended into the git_buf, such as being careful about git_buf objects that may have a NULL ptr and allowing for bufs with a valid ptr and size but zero asize as a way of referring to externally owned data.
Russell Belfer 4b11f25a 2013-09-11T16:38:33 Add ident filter This adds the ident filter (that knows how to replace $Id$) and tweaks the filter APIs and code so that git_filter_source objects actually have the updated OID of the object being filtered when it is a known value.
Russell Belfer 40cb40fa 2013-09-11T14:23:39 Add functions to manipulate filter lists Extend the git2/sys/filter API with functions to look up a filter and add it manually to a filter list. This requires some trickery because the regular attribute lookups and checks are bypassed when this happens, but in the right hands, it will allow a user to have granular control over applying filters.
Russell Belfer 0646634e 2013-09-11T12:45:37 Update filter registry code This updates the git filter registry to be a little cleaner and plugs some memory leaks.
Russell Belfer b47349b8 2013-09-12T14:48:24 Port tests from PR 1683 This ports over some of the tests from https://github.com/libgit2/libgit2/pull/1683 by @yorah and @ethomson
Russell Belfer 2a7d224f 2013-09-10T16:33:32 Extend public filter api with filter lists This moves the git_filter_list into the public API so that users can create, apply, and dispose of filter lists. This allows more granular application of filters to user data outside of libgit2 internals. This also converts all the internal usage of filters to the public APIs along with a few small tweaks to make it easier to use the public git_buffer stuff alongside the internal git_buf.
Russell Belfer 974774c7 2013-09-09T16:57:34 Add attributes to filters and fix registry The filter registry as implemented was too primitive to actually work once multiple filters were coming into play. This expands the implementation of the registry to handle multiple prioritized filters correctly. Additionally, this adds an "attributes" field to a filter that makes it really really easy to implement filters that are based on one or more attribute values. The lookup and even simple value checking can all happen automatically without custom filter code. Lastly, with the registry improvements, this fills out the filter lifecycle callbacks, with initialize and shutdown callbacks that will be called before the filter is first used and after it is last invoked. This allows for system-wide initialization and cleanup by the filter.
Russell Belfer 29e92d38 2013-09-10T16:53:09 Hook up filter initialize callback I knew I forgot something
Russell Belfer 570ba25c 2013-08-30T16:02:07 Make git_filter_source opaque
Russell Belfer 85d54812 2013-08-28T16:44:04 Create public filter object and use it This creates include/sys/filter.h with a basic definition of a git_filter and then converts the internal code to use it. There are related internal objects (git_filter_list) that we will want to publish at some point, but this is a first step.
Arkadiy Shapkin 10c06114 2013-03-17T04:46:46 Several warnings detected by static code analyzer fixed Implicit type conversion argument of function to size_t type Suspicious sequence of types castings: size_t -> int -> size_t Consider reviewing the expression of the 'A = B == C' kind. The expression is calculated as following: 'A = (B == C)' Unsigned type is never < 0
Edward Thomson 359fc2d2 2013-01-08T17:07:25 update copyrights
Russell Belfer 7bf87ab6 2012-11-28T09:58:48 Consolidate text buffer functions There are many scattered functions that look into the contents of buffers to do various text manipulations (such as escaping or unescaping data, calculating text stats, guessing if content is binary, etc). This groups all those functions together into a new file and converts the code to use that. This has two enhancements to existing functionality. The old text stats function is significantly rewritten and the BOM detection code was extended (although largely we can't deal with anything other than a UTF8 BOM).
Jameson Miller c902f5a0 2012-11-01T12:11:24 Update of text stats calculation Do not interpret 0x85 as Next Line (NEL) char when gathering statistics for a text file.
nulltoken 5e4cb4f4 2012-09-17T10:38:57 checkout : reduce memory usage when not filtering
nulltoken 3aa443a9 2012-08-20T16:56:45 checkout: introduce git_checkout_tree()
Ben Straub 7cae2bcd 2012-07-21T20:11:37 filter: fix memory leak
Ben Straub 9587895f 2012-07-16T12:06:23 Migrate code to git_filter_blob_contents. Also removes the unnecessary check for filter length, since git_filters_apply does the right thing when there are none, and it's more efficient than this.
Ben Straub f2d42eea 2012-07-09T20:21:22 Checkout: add structure for CRLF.
Vicent Martí e172cf08 2012-05-18T01:21:06 errors: Rename the generic return codes
Vicent Martí 3fbcac89 2012-05-02T19:56:38 Remove old and unused error codes
Vicent Martí b8802146 2012-05-01T19:16:14 Merge remote-tracking branch 'carlosmn/remaining-errors' into new-error-handling Conflicts: src/refspec.c
nulltoken fa6420f7 2012-04-29T21:46:33 buf: deploy git_buf_len()
Carlos Martín Nieto 3aa351ea 2012-04-26T15:05:07 error handling: move the missing parts over to the new error handling
Russell Belfer 2bc8fa02 2012-04-17T10:14:24 Implement git_pool paged memory allocator This adds a `git_pool` object that can do simple paged memory allocation with free for the entire pool at once. Using this, you can replace many small allocations with large blocks that can then cheaply be doled out in small pieces. This is best used when you plan to free the small blocks all at once - for example, if they represent the parsed state from a file or data stream that are either all kept or all discarded. There are two real patterns of usage for `git_pools`: either for "string" allocation, where the item size is a single byte and you end up just packing the allocations in together, or for "fixed size" allocation where you are allocating a large object (e.g. a `git_oid`) and you generally just allocation single objects that can be tightly packed. Of course, you can use it for other things, but those two cases are the easiest.
Russell Belfer ce49c7a8 2012-03-02T15:09:40 Add filter tests and fix some bugs This adds some initial unit tests for file filtering and fixes some simple bugs in filter application.
Vicent Martí f2c25d18 2012-03-02T20:08:00 config: Implement a proper cvar cache
Vicent Martí 47a899ff 2012-03-01T21:19:51 filter: Beautiful refactoring Comments soothe my soul.
Vicent Martí 788430c8 2012-03-01T05:06:47 filter: Properly cache filter settings
Vicent Martí c5266eba 2012-03-01T01:16:25 filter: Precache the filter config options on load
Vicent Martí 27950fa3 2012-02-29T01:26:03 filter: Add write-to CRLF filter
Vicent Martí 450b40ca 2012-02-28T01:13:32 filter: Load attributes for file
Vicent Martí 44b1ff4c 2012-02-27T04:31:05 filter: Apply filters before writing a file to the ODB Initial implementation. The relevant code is in `blob.c`: the blob write function has been split into smaller functions. - Directly write a file to the ODB in streaming mode - Directly write a symlink to the ODB in direct mode - Apply a filter, and write a file to the ODB in direct mode When trying to write a file, we first call `git_filter__load_for_file`, which populates a filters array with the required filters based on the filename. If no filters are resolved to the filename, we can write to the ODB in streaming mode straight from disk. Otherwise, we load the whole file in memory and use double-buffering to apply the filter chain. We finish by writing the file as a whole to the ODB.