Hash :
9a26e704
Author :
Date :
2025-01-02T13:54:54
string-desc, xstring-desc, string-desc-quotearg: Rename functions. * lib/string-desc.h (sd_equals): Renamed from string_desc_equals. (sd_startswith): Renamed from string_desc_startswith. (sd_endswith): Renamed from string_desc_endswith. (sd_cmp): Renamed from string_desc_cmp. (sd_c_casecmp): Renamed from string_desc_c_casecmp. (sd_index): Renamed from string_desc_index. (sd_last_index): Renamed from string_desc_last_index. (sd_contains): Renamed from string_desc_contains. (sd_new_empty): Renamed from string_desc_new_empty. (sd_new_addr): Renamed from string_desc_new_addr. (sd_from_c): Renamed from string_desc_from_c. (sd_substring): Renamed from string_desc_substring. (sd_write): Renamed from string_desc_write. (sd_fwrite): Renamed from string_desc_fwrite. (sd_new): Renamed from string_desc_new. (sd_new_filled): Renamed from string_desc_new_filled. (sd_copy): Renamed from string_desc_copy. (sd_concat): Renamed from string_desc_concat. (sd_c): Renamed from string_desc_c. (sd_set_char_at): Renamed from string_desc_set_char_at. (sd_fill): Renamed from string_desc_fill. (sd_overwrite): Renamed from string_desc_overwrite. (sd_free): Renamed from string_desc_free. (sd_length): Renamed from string_desc_length. (sd_char_at): Renamed from string_desc_char_at. (sd_data): Renamed from string_desc_data. (sd_is_empty): Renamed from string_desc_is_empty. * lib/string-desc.c (sd_equals): Renamed from string_desc_equals. (sd_startswith): Renamed from string_desc_startswith. (sd_endswith): Renamed from string_desc_endswith. (sd_cmp): Renamed from string_desc_cmp. (sd_c_casecmp): Renamed from string_desc_c_casecmp. (sd_index): Renamed from string_desc_index. (sd_last_index): Renamed from string_desc_last_index. (sd_new_empty): Renamed from string_desc_new_empty. (sd_new_addr): Renamed from string_desc_new_addr. (sd_from_c): Renamed from string_desc_from_c. (sd_substring): Renamed from string_desc_substring. (sd_write): Renamed from string_desc_write. (sd_fwrite): Renamed from string_desc_fwrite. (sd_new): Renamed from string_desc_new. (sd_new_filled): Renamed from string_desc_new_filled. (sd_copy): Renamed from string_desc_copy. (sd_concat): Renamed from string_desc_concat. (sd_c): Renamed from string_desc_c. (sd_set_char_at): Renamed from string_desc_set_char_at. (sd_fill): Renamed from string_desc_fill. (sd_overwrite): Renamed from string_desc_overwrite. (sd_free): Renamed from string_desc_free. * lib/xstring-desc.h (xsd_concat): Renamed from xstring_desc_concat. (xsd_new): Renamed from xstring_desc_new. (xsd_new_filled): Renamed from xstring_desc_new_filled. (xsd_copy): Renamed from xstring_desc_copy. (xsd_c): Renamed from xstring_desc_c. * lib/xstring-desc.c (xsd_concat): Renamed from xstring_desc_concat. * lib/string-desc-quotearg.h (sd_quotearg_buffer): Renamed from string_desc_quotearg_buffer. (sd_quotearg_alloc): Renamed from string_desc_quotearg_alloc. (sd_quotearg_n): Renamed from string_desc_quotearg_n. (sd_quotearg): Renamed from string_desc_quotearg. (sd_quotearg_n_style): Renamed from string_desc_quotearg_n_style. (sd_quotearg_style): Renamed from string_desc_quotearg_style. (sd_quotearg_char): Renamed from string_desc_quotearg_char. (sd_quotearg_colon): Renamed from string_desc_quotearg_colon. (sd_quotearg_n_custom): Renamed from string_desc_quotearg_n_custom. (sd_quotearg_custom): Renamed from sd_quotearg_n_custom. * lib/string-desc-contains.c (sd_contains): Renamed from string_desc_contains. * lib/string-buffer.h: Update. * lib/string-buffer.c (sb_append_desc, sb_contents, sb_dupfree): Update. * lib/xstring-buffer.c (sb_xdupfree): Update. * lib/sf-istream.c (sf_istream_init_from_string_desc): Update. * tests/test-string-desc.c (main): Update. * tests/test-string-desc.sh: Update. * tests/test-xstring-desc.c (main): Update. * tests/test-string-desc-quotearg.c (main): Update. * tests/test-string-buffer.c (main): Update. * tests/test-sf-istream.c (main): Update. * tests/test-sfl-istream.c (main): Update. * doc/string-desc.texi: Update. * doc/strings.texi: Update. * NEWS: Mention the change.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115
@node Handling strings with NUL characters
@section Handling strings with NUL characters
@c Copyright (C) 2023--2025 Free Software Foundation, Inc.
@c Permission is granted to copy, distribute and/or modify this document
@c under the terms of the GNU Free Documentation License, Version 1.3 or
@c any later version published by the Free Software Foundation; with no
@c Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A
@c copy of the license is at <https://www.gnu.org/licenses/fdl-1.3.en.html>.
@c Written by Bruno Haible.
Strings in C are usually represented by a character sequence with a
terminating NUL character. A @samp{char *}, pointer to the first byte
of this character sequence, is what gets passed around as function
argument or return value.
The major restriction of this string representation is that it cannot
handle strings that contain NUL characters: such strings will appear
shorter than they were meant to be. In most application areas, this is
not a problem, and the @code{char *} type is well usable.
A second problem of this string representation is that
taking a substring is not cheap:
it either requires a memory allocation
or a destructive modification of the string.
The former has a runtime cost;
the latter complicates the logic of the program.
This matters for application areas that analyze text, such as parsers.
In areas where strings with embedded NUL characters need to be handled
or where taking substrings is a recurrent operation,
the common approach is to use a @code{char *ptr} pointer variable
together with a @code{size_t nbytes} variable (or an @code{idx_t nbytes}
variable, if you want to avoid problems due to integer overflow). This
works fine in code that constructs or manipulates strings with embedded
NUL characters. But when it comes to @emph{storing} them, for example
in an array or as key or value of a hash table, one needs a type that
combines these two fields.
@mindex string-desc
@mindex xstring-desc
@mindex string-desc-quotearg
The Gnulib modules @code{string-desc}, @code{xstring-desc}, and
@code{string-desc-quotearg} provide such a type. We call it a
``string descriptor'' and name it @code{string_desc_t}.
The type @code{string_desc_t} is a struct that contains a pointer to the
first byte and the number of bytes of the memory region that make up the
string. An additional terminating NUL byte, that may be present in
memory, is not included in this byte count. This type implements the
same concept as @code{std::string_view} in C++, or the @code{String}
type in Java.
A @code{string_desc_t} can be passed to a function as an argument, or
can be the return value of a function. This is type-safe: If, by
mistake, a programmer passes a @code{string_desc_t} to a function that
expects a @code{char *} argument, or vice versa, or assigns a
@code{string_desc_t} value to a variable of type @code{char *}, or
vice versa, the compiler will report an error.
Functions related to string descriptors are provided:
@itemize
@item
Side-effect-free operations in @code{"string-desc.h"},
@item
Memory-allocating operations in @code{"string-desc.h"},
@item
Memory-allocating operations with out-of-memory checking in
@code{"xstring-desc.h"},
@item
Operations with side effects in @code{"string-desc.h"}.
@end itemize
For outputting a string descriptor, the @code{*printf} family of
functions cannot be used directly. A format string directive such as
@code{"%.*s"} would not work:
@itemize
@item
it would stop the output at the first encountered NUL character,
@item
it would require to cast the number of bytes to @code{int}, and thus
would not work for strings longer than @code{INT_MAX} bytes.
@end itemize
@c @noindent Other format string directives don't work either, because
@c the only way to produce a NUL character in @code{*printf}'s output
@c is through a dedicated @code{%c} or @code{%lc} directive.
Therefore Gnulib offers
@itemize
@item
a function @code{sd_fwrite} that outputs a string descriptor to
a @code{FILE} stream,
@item
a function @code{sd_write} that outputs a string descriptor to
a file descriptor,
@item
and for those applications where the NUL characters should become
visible as @samp{\0}, a family of @code{quotearg} based functions, that
allow to specify the escaping rules in detail.
@end itemize
The functionality is thus split across three modules as follows:
@itemize
@item
The module @code{string-desc}, under LGPL, defines the type and
elementary functions.
@item
The module @code{xstring-desc}, under GPL, defines the memory-allocating
functions with out-of-memory checking.
@item
The module @code{string-desc-quotearg}, under GPL, defines the
@code{quotearg} based functions.
@end itemize