|
d77378eb
|
2019-09-13T08:54:26
|
|
regexp: implement new regular expression API
We currently support a set of different regular expression backends with
PCRE, PCRE2, regcomp(3P) and regcomp_l(3). The current implementation of
this is done via a simple POSIX wrapper that either directly uses
supplied functions or that is a very small wrapper.
To support PCRE and PCRE2, we use their provided <pcreposix.h> and
<pcre2posix.h> wrappers. These wrappers are implemented in such a way
that the accompanying libraries pcre-posix and pcre2-posix provide the
same symbols as the libc ones, namely regcomp(3P) et al. This works out
on some systems just fine, most importantly on glibc-based ones, where
the regular expression functions are implemented as weak aliases and
thus get overridden by linking in the pcre{,2}-posix library. On other
systems we depend on the linking order of libc and pcre library, and as
libc always comes first we will end up with the functions of the libc
implementation. As a result, we may use the structures `regex_t` and
`regmatch_t` declared by <pcre{,2}posix.h>, but use functions defined by
the libc, leading to segfaults.
The issue is not easily solvable. Somed distributions like Debian have
resolved this by patching PCRE and PCRE2 to carry custom prefixes to all
the POSIX function wrappers. But this is not supported by upstream and
thus inherently unportable between distributions. We could instead try
to modify linking order, but this starts becoming fragile and will not
work e.g. when libgit2 is loaded via dlopen(3P) or similar ways. In the
end, this means that we simply cannot use the POSIX wrappers provided by
the PCRE libraries at all.
Thus, this commit introduces a new regular expression API. The new API
is on a tad higher level than the previous POSIX abstraction layer, as
it tries to abstract away any non-portable flags like e.g. REG_EXTENDED,
which has no equivalents in all of our supported backends. As there are
no users of POSIX regular expressions that do _not_ reguest REG_EXTENDED
this is fine to be abstracted away, though. Due to the API being
higher-level than before, it should generally be a tad easier to use
than the previous one.
Note: ideally, the new API would've been called `git_regex_foobar` with
a file "regex.h" and "regex.c". Unfortunately, this is currently
impossible to implement due to naming clashes between the then-existing
"regex.h" and <regex.h> provided by the libc. As we add the source
directory of libgit2 to the header search path, an include of <regex.h>
would always find our own "regex.h". Thus, we have to take the bitter
pill of adding one more character to all the functions to disambiguate
the includes.
To improve guarantees around cross-backend compatibility, this commit
also brings along an improved regular expression test suite
core::regexp.
|