String
UCX strings store character arrays together with a length and come in two variants: immutable (cxstring
) and mutable (cxmutstr
).
In general, UCX strings are not necessarily zero-terminated. If a function guarantees to return a zero-terminated string, it is explicitly mentioned in the documentation. As a rule of thumb, you should not pass a character array of a UCX string structure to another API without explicitly ensuring that the string is zero-terminated.
Basics
The functions cx_str()
and cx_mutstr()
create a UCX string from a const char*
or a char*
and compute the length with a call to stdlib strlen()
. In case you already know the length, or the string is not zero-terminated, you can use cx_strn()
or cx_mutstrn()
.
The function cx_strdup_a()
allocates new memory with the given allocator
and copies the given string
and guarantees that the result string is zero-terminated. The function cx_strdup()
is equivalent to cx_strdup_a()
, except that it uses the default stdlib allocator.
Allocated strings are always of type cxmutstr
and can be deallocated by a call to cx_strfree()
or cx_strfree_a()
. The caller must make sure to use the correct allocator for deallocating a string. It is safe to call these functions multiple times on a given string, as the pointer will be nulled and the length set to zero. It is also safe to call the functions with a NULL
-pointer, just like any other free()
-like function.
Comparison
The cx_strcmp()
function compares two UCX strings lexicographically and returns an integer greater than, equal to, or less than 0, if s1
is greater than, equal to, or less than s2
, respectively. The cx_strcmp_p()
function is equivalent, except that it takes pointers to the UCX strings and the signature is compatible with cx_compare_func
.
The functions cx_strprefix()
and cx_strsuffic()
check if string
starts with prefix
or ends with suffix
, respectively.
The functions cx_strcasecmp()
, cx_strcasecmp_p()
, cx_strcaseprefix()
, and cx_strcasesuffix()
are equivalent, except that they compare the strings case-insensitive.
Concatenation
The cx_strcat_a()
function takes count
UCX strings, allocates memory for a concatenation of those strings with a single allocation, and copies the contents of the strings to the new memory. cx_strcat()
is equivalent, except that is uses the default stdlib allocator.
The cx_strcat_ma()
and cx_strcat_m()
append the count
strings to the specified string str
and, instead of allocating new memory, reallocate the existing memory in str
. If the pointer in str
is NULL
, there is no difference to cx_strcat_a()
. Note, that count
always denotes the number of variadic arguments in both variants.
The function cx_strlen()
sums the length of the specified strings.
Find Characters and Substrings
The functions cx_strchr()
, cx_strrchr()
, and cx_strstr()
, behave like their stdlib counterparts.
The function cx_strsubs()
returns the substring starting at the specified start
index, and cx_strsubsl()
returns a substring with at most length
bytes.
The function cx_strtrim()
returns the substring that results when removing all leading and trailing whitespace characters (a space character is one of the following string: " \t\r\n\v\f"
).
All functions with the _m
suffix behave exactly the same as their counterparts without _m
suffix, except that they operate on a cxmustr
. In both variants the functions return a view into the given string
and thus the returned strings must never be passed to cx_strfree()
.
Replace Substrings
The function cx_strreplace()
allocates a new string which will contain a copy of str
where every occurrence of search
is replaced with replacement
. The new string is guaranteed to be zero-terminated even if str
is not.
The function cx_strreplace_a()
uses the specified allocator
to allocate the new string.
The functions cx_strreplacen()
and cx_strreplacen_a()
are equivalent, except that they stop after replmax
number of replacements.
Basic Splitting
The cx_strsplit()
function splits the input string
using the specified delimiter delim
and writes the substrings into the pre-allocated output
array. The maximum number of resulting strings can be specified with limit
. That means, at most limit-1
splits are performed. The function returns the actual number of items written to output
.
On the other hand, cx_strsplit_a()
uses the specified allocator
to allocate the output array, and writes the pointer to the allocated memory to output
.
The functions cx_strsplit_m()
and cx_strsplit_ma()
are equivalent to cx_strsplit()
and cx_strsplit_a()
, except that they work on cxmustr
instead of cxstring
.
Complex Tokenization
You can tokenize a string by creating a tokenization context with cx_strtok()
, and calling cx_strtok_next()
or cx_strtok_next_m()
as long as they return true
.
The tokenization context is initialized with the string str
to tokenize, one delimiter delim
, and a limit
for the maximum number of tokens. When limit
is reached, the remaining part of str
is returned as one single token.
You can add additional delimiters to the context by calling cx_strtok_delim()
, and specifying an array of delimiters to use.
Example
Conversion to Numbers
For each integer type, as well as float
and double
, there are functions to convert a UCX string to a number of that type.
Integer conversion comes in two flavours:
The basic variant takes a string of any UCX string type, a pointer to the output
integer, and the base
(one of 2, 8, 10, or 16). Conversion is attempted with respect to the specified base
and respects possible special notations for that base. Hexadecimal numbers may be prefixed with 0x
, x
, or #
, and binary numbers may be prefixed with 0b
or b
.
The _lc
versions of the integer conversion functions are equivalent, except that they allow the specification of an array of group separator chars, each of which is simply ignored during conversion. The default group separator for the basic version is a comma ,
.
The signature for the floating point conversions is quite similar:
The two differences are that the floating point versions do not support different bases, and the _lc
variant allows specifying not only an array of group separators, but also the character used for the decimal separator.
In the basic variant, the group separator is again a comma ,
, and the decimal separator is a dot .
.