2009-11-10

Code: Tokenizer


Here’s a little C++ utility class for splitting C strings into tokens by given separators. It’s inspired by “tokenwad” taken from Sol’s CFL3, which I have used quite a few times. This version has low memory overhead as it performs only 1-2 allocations (depending if the tokenization is done in-place or a separate copy is needed).

Tokenizer(bool trimWhitespace = true), clear(bool trimWhitespace = true)
Initializes / resets tokenizer, whitespace trimming from begin and end of tokens is optional.

bool tokenize(const char *str, const char *separators);
Tokenizes string by the given separators. Creates a private copy of the data. Returns true when successful, or false if tokenization failed for some reason.

bool tokenizeInPlace(char *str, const char *separators);
Tokenizes string in-place by the given separators, destroying the original content. Returns true when successful, or false if tokenization failed for some reason.

int count() const;
Returns count of tokens, or 0 if tokenizer hasn’t been run yet.

char * get(int index) const;
Returns token by index, or 0 if no token with such index exists.

int getAsInt(int index, int defaultValue = -1);
Returns token as integer, or given default value if it can’t be converted.

bool equals(int index, const char *str);
Returns true if token with given index equals given string, or false otherwise.

bool equalsIgnoreCase(int index, const char *str);
Returns true if token with given index equals given string case insensitively, or false otherwise.

Download source code (Tokenizer.cpp, Tokenizer.h, tokenizer_test.cpp):
tokenizer.zip

CodeRSS feed for responses (closed) — Trackbacks disabled.