• Show log

    Commit

  • Hash : d77378eb
    Author : Patrick Steinhardt
    Date : 2019-09-13T08:54:26

    regexp: implement new regular expression API
    
    We currently support a set of different regular expression backends with
    PCRE, PCRE2, regcomp(3P) and regcomp_l(3). The current implementation of
    this is done via a simple POSIX wrapper that either directly uses
    supplied functions or that is a very small wrapper.
    
    To support PCRE and PCRE2, we use their provided <pcreposix.h> and
    <pcre2posix.h> wrappers. These wrappers are implemented in such a way
    that the accompanying libraries pcre-posix and pcre2-posix provide the
    same symbols as the libc ones, namely regcomp(3P) et al. This works out
    on some systems just fine, most importantly on glibc-based ones, where
    the regular expression functions are implemented as weak aliases and
    thus get overridden by linking in the pcre{,2}-posix library. On other
    systems we depend on the linking order of libc and pcre library, and as
    libc always comes first we will end up with the functions of the libc
    implementation. As a result, we may use the structures `regex_t` and
    `regmatch_t` declared by <pcre{,2}posix.h>, but use functions defined by
    the libc, leading to segfaults.
    
    The issue is not easily solvable. Somed distributions like Debian have
    resolved this by patching PCRE and PCRE2 to carry custom prefixes to all
    the POSIX function wrappers. But this is not supported by upstream and
    thus inherently unportable between distributions. We could instead try
    to modify linking order, but this starts becoming fragile and will not
    work e.g. when libgit2 is loaded via dlopen(3P) or similar ways. In the
    end, this means that we simply cannot use the POSIX wrappers provided by
    the PCRE libraries at all.
    
    Thus, this commit introduces a new regular expression API. The new API
    is on a tad higher level than the previous POSIX abstraction layer, as
    it tries to abstract away any non-portable flags like e.g. REG_EXTENDED,
    which has no equivalents in all of our supported backends. As there are
    no users of POSIX regular expressions that do _not_ reguest REG_EXTENDED
    this is fine to be abstracted away, though. Due to the API being
    higher-level than before, it should generally be a tad easier to use
    than the previous one.
    
    Note: ideally, the new API would've been called `git_regex_foobar` with
    a file "regex.h" and "regex.c". Unfortunately, this is currently
    impossible to implement due to naming clashes between the then-existing
    "regex.h" and <regex.h> provided by the libc. As we add the source
    directory of libgit2 to the header search path, an include of <regex.h>
    would always find our own "regex.h". Thus, we have to take the bitter
    pill of adding one more character to all the functions to disambiguate
    the includes.
    
    To improve guarantees around cross-backend compatibility, this commit
    also brings along an improved regular expression test suite
    core::regexp.