The missing C++ string class

cstr is the string class you expected to find in STL, but didn't. The design goal of cstr is to make all the string handling functions from the standard C library available within a single C++ class.

The terse names of the original C functions have been preserved to ease the transition for C programmers. But the "str" prefix has been dropped. For example strlen(mystring) is now mystring.len(). New functions have been given short but less cryptic names using the same style (one word, all lower case).

Most cstr functions work as they do in the standard C library or slightly better. Some examples:

  1. Automatic buffer allocation and deallocation.
  2. The resulting string is always zero terminated.
  3. All functions are thread safe in the sense that the class (but not an individual object) can be used by multiple threads concurrently.
  4. Pre-conditions are tested using assert() which makes it possible to trap "out of bounds" errors in debug build, with no speed penalties in release build.

Download

Click here to download the source code. It is free to use and distribute. The ZIP archive is only 8kB and includes a full set of unit tests. The last update was October 2008.

Memory Allocation

Memory is allocated on the heap when a string object is created, and released when the object is destroyed. Memory is allocated in blocks of a fixed size. More memory is automatically allocated if a string grows out of its current allocation. Memory is not released when a string shrinks unless garb() is called. The block size is controlled by MEM_ALLOC_SIZE and can be changed to find the right balance between memory consumption and memory fragmentation.

Performance

The performance of cstr depends on the performance of the string and memory handling functions in the standard C library on the target system. The cstr class does not use any copy-on-write optimization since it has been shown that such optimizations does not work in a multi-threaded environment.

If you have a performance problem releated to string creation, you can try to re-use string objects instead of destroying old objects and creating new ones. If you are building long strings in small increments, a bigger MEM_ALLOC_SIZE will improve the performance by reducing the number of re-allocations.

Localization

The cstr class use the char data type to represent individual characters. This means that cstr is limited to character sets (code pages) that contains no more characters than can be represented by a single char variable. The cstr class relies on the current locale setting to identify the current character set. The "C" locale is usually the default locale. The character set of the "C" locale is 7 bit ASCII, which is only suitable for english text. The locale setting can be changed by the setlocale() function in the C library. This is usually done only once, in the beginning of a program.

Pitfalls

Certain tradeoffs have been made that has to do with complex C++ issues such as operator overloading. These tradeoffs creates a couple of pitfalls that are best described by example:

// The following code will compile:
cstr s("hello");
if(s[0] == 'h') s.set(0,'H');

// The following code will NOT compile:
cstr s("hello");
if(s[0] == 'h') s[0] = 'H'; // Error

// The following code will compile:
cstr s1("Hello");
cstr s2 = s1 + " world!";

// The following code will NOT compile:
cstr s2 = "Hello" + " world!";

Feedback

Feedback about cstr can be sent to the following email address:

Function Summary

cstr() Constructor
~cstr() Destructor
cstr(const char * pstr) Constructor
cstr(const char * pstr, size_t len) Constructor
cstr(const cstr & str) Copy Constructor
operator const char * () const Type cast
const char * c_str() const Explicit version of type cast
size_t len() const Get number of characters
bool isempty() const Check if the string is empty
void set(size_t pos, char c) Set a character in the string
void set(size_t pos, char c, size_t count) Set multiple characters in the string
void fill(char c, size_t count) Fill a string with a single character
void ncpy(const char * pstr, size_t len) Copy a limited string
void ncat(const char * pstr, size_t len) Append a limited string
const cstr & operator=(const char * pstr) Assign string
const cstr & operator=(const cstr & str) Assign string
const cstr & operator+=(const cstr & str) Append string
const cstr & operator+=(char c) Append character
void clear() Clear the string
void garb() Free unused memory
bool operator<(const char * pstr) const Case sensitive compare
bool operator>(const char * pstr) const Case sensitive compare
bool operator==(const char * pstr) const Case sensitive compare
bool operator!=(const char * pstr) const Case sensitive compare
bool sameas(const char * pstr) const Case insensitive compare
int cmp(const char * pstr) const Case sensitive compare
int icmp(const char * pstr) const Case insensitive compare
int ncmp(const char * pstr, size_t len) const Case sensitive compare
int nicmp(const char * pstr, size_t len) const Case insensitive compare
int coll(const char * pstr) const Case sensitive lexicographic compare
int icoll(const char * pstr) const Case insensitive lexicographic compare
int ncoll(const char * pstr, size_t len) const Case sensitive lexicographic compare
int nicoll(const char * pstr, size_t len) const Case insensitive lexicographic compare
void upr() Convert to upper case
void lwr() Convert to lower case
void cut(size_t len) Remove characters from the start
void trunc(size_t len) Remove characters from the end
void trim() Remove leading and trailing non-graphic characters
void pad(size_t len, char padding = ' ') Expand string to len characters by padding
void padright(size_t len, char padding = ' ') Expand string to len characters by padding at the end
void padleft(size_t len, char padding = ' ') Expand string to len characters by padding at the start
void del(size_t from, size_t to) Delete sub-string
void ins(size_t pos, const cstr & str) Insert sub-string at position
void rpl(size_t pos, const cstr & oldstr, const cstr & newstr) Replace sub-string at position
void rpl(char old_c, char new_c) Replace all old_c characters with new_c
void rev() Reverse string
cstr mid(size_t from, size_t to) const Get sub-string
cstr left(size_t len) const Get start of string
cstr right(size_t len) const Get end of string
bool chr(char c, size_t & pos) const Search for character
bool rchr(char c, size_t & pos) const Search backwards for character
bool str(const char * pstr, size_t & pos) const Search for sub-string
bool tok(const char * pdelim, cstr & tok, size_t & pos) const Get next token
bool has(char c) const Check if the string contains a specific character
bool has(const char * pstr) const Check if the string contains a specific sub-string
bool cspn(const char * pset, size_t & pos) const Find first character from a set
bool spn(const char * pset, size_t & pos) const Find first character NOT from a set
int toi() const Convert decimal string to integer
long tol(size_t & pos, int base = 10) const Converts string to long
long toul(size_t & pos, int base = 10) const Converts string to unsigned long
double tod() const Converts string to double
double tod(size_t & pos) const Converts string at position to double
int sprintf(const char * pFmt, ...) Format string the classic way
static cstr decstr(int value) Convert integer to decimal string
static cstr hexstr(unsigned value) Convert unsigned integer to hexadecimal string
static cstr fpstr(double value, size_t digits) Convert double to string