String | UAP Common Extensions

Basics

#include <cx/string.h>

struct cx_string_s {const char *ptr; size_t length;};

struct cx_mutstr_s {char *ptr; size_t length;};

typedef struct cx_string_s cxstring;

typedef struct cx_mutstr_s cxmutstr;

cxstring cx_str(const char *cstring);

cxstring cx_strn(const char *cstring, size_t length);

cxmutstr cx_mutstr(char *cstring);

cxmutstr cx_mutstrn(char *cstring, size_t length);

cxstring cx_strcast(AnyStr str);

cxmutstr cx_strdup(AnyStr string);

cxmutstr cx_strdup_a(const CxAllocator *allocator, AnyStr string);

void cx_strfree(cxmutstr *str);

void cx_strfree_a(const CxAllocator *alloc, cxmutstr *str);

The functions cx_str() and cx_mutstr() create a UCX string from a const char* or a char* and compute the length with a call to stdlib strlen(). In case you already know the length, or the string is not zero-terminated, you can use cx_strn() or cx_mutstrn().

The function cx_strdup_a() allocates new memory with the given allocator and copies the given string and guarantees that the result string is zero-terminated. The function cx_strdup() is equivalent to cx_strdup_a(), except that it uses the default stdlib allocator.

Allocated strings are always of type cxmutstr and can be deallocated by a call to cx_strfree() or cx_strfree_a(). The caller must make sure to use the correct allocator for deallocating a string. It is safe to call these functions multiple times on a given string, as the pointer will be nulled and the length set to zero. It is also safe to call the functions with a NULL-pointer, just like any other free()-like function.

Comparison

#include <cx/string.h>

int cx_strcmp(cxstring s1, cxstring s2);

int cx_strcmp_p(const void *s1, const void *s2);

bool cx_strprefix(cxstring string, cxstring prefix);

bool cx_strsuffix(cxstring string, cxstring suffix);

int cx_strcasecmp(cxstring s1, cxstring s2);

int cx_strcasecmp_p(const void *s1, const void *s2);

bool cx_strcaseprefix(cxstring string, cxstring prefix);

bool cx_strcasesuffix(cxstring string, cxstring suffix);

The cx_strcmp() function compares two UCX strings lexicographically and returns an integer greater than, equal to, or less than 0, if s1 is greater than, equal to, or less than s2, respectively. The cx_strcmp_p() function is equivalent, except that it takes pointers to the UCX strings and the signature is compatible with cx_compare_func.

The functions cx_strprefix() and cx_strsuffic() check if string starts with prefix or ends with suffix, respectively.

The functions cx_strcasecmp(), cx_strcasecmp_p(), cx_strcaseprefix(), and cx_strcasesuffix() are equivalent, except that they compare the strings case-insensitive.

Concatenation

#include <cx/string.h>

cxmutstr cx_strcat(size_t count, ... );

cxmutstr cx_strcat_a(const CxAllocator *alloc, size_t count, ... );

cxmutstr cx_strcat_m(cxmutstr str, size_t count, ... );

cxmutstr cx_strcat_ma(const CxAllocator *alloc,
        cxmutstr str, size_t count, ... );

size_t cx_strlen(size_t count, ...);

The cx_strcat_a() function takes count UCX strings, allocates memory for a concatenation of those strings with a single allocation, and copies the contents of the strings to the new memory. cx_strcat() is equivalent, except that is uses the default stdlib allocator.

The cx_strcat_ma() and cx_strcat_m() append the count strings to the specified string str and, instead of allocating new memory, reallocate the existing memory in str. If the pointer in str is NULL, there is no difference to cx_strcat_a(). Note, that count always denotes the number of variadic arguments in both variants.

The function cx_strlen() sums the length of the specified strings.

Find Characters and Substrings

#include <cx/string.h>

cxstring cx_strchr(cxstring string, int chr);

cxstring cx_strrchr(cxstring string, int chr);

cxstring cx_strstr(cxstring string, cxstring search);

cxstring cx_strsubs(cxstring string, size_t start);

cxstring cx_strsubsl(cxstring string, size_t start, size_t length);

cxstring cx_strtrim(cxstring string);

cxmutstr cx_strchr_m(cxmutstr string, int chr);

cxmutstr cx_strrchr_m(cxmutstr string, int chr);

cxmutstr cx_strstr_m(cxmutstr string, cxstring search);

cxmutstr cx_strsubs_m(cxmutstr string, size_t start);

cxmutstr cx_strsubsl_m(cxmutstr string, size_t start, size_t length);

cxmutstr cx_strtrim_m(cxmutstr string);

The functions cx_strchr(), cx_strrchr(), and cx_strstr(), behave like their stdlib counterparts.

The function cx_strsubs() returns the substring starting at the specified start index, and cx_strsubsl() returns a substring with at most length bytes.

The function cx_strtrim() returns the substring that results when removing all leading and trailing whitespace characters (a space character is one of the following string: " \t\r\n\v\f").

All functions with the _m suffix behave exactly the same as their counterparts without _m suffix, except that they operate on a cxmustr. In both variants the functions return a view into the given string and thus the returned strings must never be passed to cx_strfree().

Replace Substrings

#include <cx/string.h>

cxmutstr cx_strreplace(cxstring str,
        cxstring search, cxstring replacement);

cxmutstr cx_strreplace_a(const CxAllocator *allocator, cxstring str,
        cxstring search, cxstring replacement);

cxmutstr cx_strreplacen(cxstring str,
        cxstring search, cxstring replacement, size_t replmax);

cxmutstr cx_strreplacen_a(const CxAllocator *allocator, cxstring str,
        cxstring search, cxstring replacement, size_t replmax);

The function cx_strreplace() allocates a new string which will contain a copy of str where every occurrence of search is replaced with replacement. The new string is guaranteed to be zero-terminated even if str is not.

The function cx_strreplace_a() uses the specified allocator to allocate the new string.

The functions cx_strreplacen() and cx_strreplacen_a() are equivalent, except that they stop after replmax number of replacements.

Basic Splitting

#include <cx/string.h>

size_t cx_strsplit(cxstring string, cxstring delim,
        size_t limit, cxstring *output);

size_t cx_strsplit_a(const CxAllocator *allocator,
        cxstring string, cxstring delim,
        size_t limit, cxstring **output);

size_t cx_strsplit_m(cxmutstr string, cxstring delim,
        size_t limit, cxmutstr *output);

size_t cx_strsplit_ma(const CxAllocator *allocator,
        cxmutstr string, cxstring delim,
        size_t limit, cxmutstr **output);

The cx_strsplit() function splits the input string using the specified delimiter delim and writes the substrings into the pre-allocated output array. The maximum number of resulting strings can be specified with limit. That means, at most limit-1 splits are performed. The function returns the actual number of items written to output.

On the other hand, cx_strsplit_a() uses the specified allocator to allocate the output array, and writes the pointer to the allocated memory to output.

The functions cx_strsplit_m() and cx_strsplit_ma() are equivalent to cx_strsplit() and cx_strsplit_a(), except that they work on cxmustr instead of cxstring.

Complex Tokenization

#include <cx/string.h>

CxStrtokCtx cx_strtok(AnyStr str, AnyStr delim, size_t limit);

void cx_strtok_delim(CxStrtokCtx *ctx,
        const cxstring *delim, size_t count);

bool cx_strtok_next(CxStrtokCtx *ctx, cxstring *token);

bool cx_strtok_next_m(CxStrtokCtx *ctx, cxmutstr *token);

You can tokenize a string by creating a tokenization context with cx_strtok(), and calling cx_strtok_next() or cx_strtok_next_m() as long as they return true.

The tokenization context is initialized with the string str to tokenize, one delimiter delim, and a limit for the maximum number of tokens. When limit is reached, the remaining part of str is returned as one single token.

You can add additional delimiters to the context by calling cx_strtok_delim(), and specifying an array of delimiters to use.

Example

#include <cx/string.h>

cxstring str = cx_str("an,arbitrarily;||separated;string");

// create the context
CxStrtokCtx ctx = cx_strtok(str, CX_STR(","), 10);

// add two more delimters
cxstring delim_more[2] = {CX_STR("||"), CX_STR(";")};
cx_strtok_delim(&ctx, delim_more, 2);

// iterate over the tokens
cxstring tok;
while(cx_strtok_next(&ctx, &tok)) {
    // to something with the tokens
    // be aware that tok is NOT zero-terminated!
}

Conversion to Numbers

For each integer type, as well as float and double, there are functions to convert a UCX string to a number of that type.

Integer conversion comes in two flavours:

int cx_strtoi(AnyStr str, int *output, int base);

int cx_strtoi_lc(AnyStr str, int *output, int base,
        const char *groupsep);

The basic variant takes a string of any UCX string type, a pointer to the output integer, and the base (one of 2, 8, 10, or 16). Conversion is attempted with respect to the specified base and respects possible special notations for that base. Hexadecimal numbers may be prefixed with 0x, x, or #, and binary numbers may be prefixed with 0b or b.

The _lc versions of the integer conversion functions are equivalent, except that they allow the specification of an array of group separator chars, each of which is simply ignored during conversion. The default group separator for the basic version is a comma ,.

The signature for the floating point conversions is quite similar:

int cx_strtof(AnyStr str, float *output);

int cx_strtof_lc(AnyStr str, float *output,
        char decsep, const char *groupsep);

The two differences are that the floating point versions do not support different bases, and the _lc variant allows specifying not only an array of group separators, but also the character used for the decimal separator.

In the basic variant, the group separator is again a comma ,, and the decimal separator is a dot ..