Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting a string by a character

People also ask

How do you split a string at a certain character?

To split a string with specific character as delimiter in Java, call split() method on the string object, and pass the specific character as argument to the split() method. The method returns a String Array with the splits as elements in the array.

How do you split a string by a specific character in Python?

Splitting Strings with the split() Function We can specify the character to split a string by using the separator in the split() function. By default, split() will use whitespace as the separator, but we are free to provide other characters if we want.

How do I split a word in a string?

The split() method of the String class accepts a String value representing the delimiter and splits into an array of tokens (words), treating the string between the occurrence of two delimiters as one token. For example, if you pass single space “ ” as a delimiter to this method and try to split a String.


Using vectors, strings and stringstream. A tad cumbersome but it does the trick.

#include <string>
#include <vector>
#include <sstream>

std::stringstream test("this_is_a_test_string");
std::string segment;
std::vector<std::string> seglist;

while(std::getline(test, segment, '_'))
{
   seglist.push_back(segment);
}

Which results in a vector with the same contents as

std::vector<std::string> seglist{ "this", "is", "a", "test", "string" };

Boost has the split() you are seeking in algorithm/string.hpp:

std::string sample = "07/3/2011";
std::vector<std::string> strs;
boost::split(strs, sample, boost::is_any_of("/"));

Another way (C++11/boost) for people who like RegEx. Personally I'm a big fan of RegEx for this kind of data. IMO it's far more powerful than simply splitting strings using a delimiter since you can choose to be be a lot smarter about what constitutes "valid" data if you wish.

#include <string>
#include <algorithm>    // copy
#include <iterator>     // back_inserter
#include <regex>        // regex, sregex_token_iterator
#include <vector>

int main()
{
    std::string str = "08/04/2012";
    std::vector<std::string> tokens;
    std::regex re("\\d+");

    //start/end points of tokens in str
    std::sregex_token_iterator
        begin(str.begin(), str.end(), re),
        end;

    std::copy(begin, end, std::back_inserter(tokens));
}

Another possibility is to imbue a stream with a locale that uses a special ctype facet. A stream uses the ctype facet to determine what's "whitespace", which it treats as separators. With a ctype facet that classifies your separator character as whitespace, the reading can be pretty trivial. Here's one way to implement the facet:

struct field_reader: std::ctype<char> {

    field_reader(): std::ctype<char>(get_table()) {}

    static std::ctype_base::mask const* get_table() {
        static std::vector<std::ctype_base::mask> 
            rc(table_size, std::ctype_base::mask());

        // we'll assume dates are either a/b/c or a-b-c:
        rc['/'] = std::ctype_base::space;
        rc['-'] = std::ctype_base::space;
        return &rc[0];
    }
};

We use that by using imbue to tell a stream to use a locale that includes it, then read the data from that stream:

std::istringstream in("07/3/2011");
in.imbue(std::locale(std::locale(), new field_reader);

With that in place, the splitting becomes almost trivial -- just initialize a vector using a couple of istream_iterators to read the pieces from the string (that's embedded in the istringstream):

std::vector<std::string>((std::istream_iterator<std::string>(in),
                          std::istream_iterator<std::string>());

Obviously this tends toward overkill if you only use it in one place. If you use it much, however, it can go a long ways toward keeping the rest of the code quite clean.


Since nobody has posted this yet: The c++20 solution is very simple using ranges. You can use a std::ranges::views::split to break up the input, and then transform the input into std::string or std::string_view elements.

#include <ranges>


...

// The input to transform
const auto str = std::string{"Hello World"};

// Function to transform a range into a std::string
// Replace this with 'std::string_view' to make it a view instead.
auto to_string = [](auto&& r) -> std::string {
    const auto data = &*r.begin();
    const auto size = static_cast<std::size_t>(std::ranges::distance(r));

    return std::string{data, size};
};

const auto range = str | 
                   std::ranges::views::split(' ') | 
                   std::ranges::views::transform(to_string);

for (auto&& token : str | range) {
    // each 'token' is the split string
}

This approach can realistically compose into just about anything, even a simple split function that returns a std::vector<std::string>:

auto split(const std::string& str, char delimiter) -> std::vector<std::string>
{
    const auto range = str | 
                       std::ranges::views::split(delimiter) | 
                       std::ranges::views::transform(to_string);

    return {std::ranges::begin(range), std::ranges::end(range)};
}

Live Example


I inherently dislike stringstream, although I'm not sure why. Today, I wrote this function to allow splitting a std::string by any arbitrary character or string into a vector. I know this question is old, but I wanted to share an alternative way of splitting std::string.

This code omits the part of the string you split by from the results altogether, although it could be easily modified to include them.

#include <string>
#include <vector>

void split(std::string str, std::string splitBy, std::vector<std::string>& tokens)
{
    /* Store the original string in the array, so we can loop the rest
     * of the algorithm. */
    tokens.push_back(str);

    // Store the split index in a 'size_t' (unsigned integer) type.
    size_t splitAt;
    // Store the size of what we're splicing out.
    size_t splitLen = splitBy.size();
    // Create a string for temporarily storing the fragment we're processing.
    std::string frag;
    // Loop infinitely - break is internal.
    while(true)
    {
        /* Store the last string in the vector, which is the only logical
         * candidate for processing. */
        frag = tokens.back();
        /* The index where the split is. */
        splitAt = frag.find(splitBy);
        // If we didn't find a new split point...
        if(splitAt == std::string::npos)
        {
            // Break the loop and (implicitly) return.
            break;
        }
        /* Put everything from the left side of the split where the string
         * being processed used to be. */
        tokens.back() = frag.substr(0, splitAt);
        /* Push everything from the right side of the split to the next empty
         * index in the vector. */
        tokens.push_back(frag.substr(splitAt+splitLen, frag.size()-(splitAt+splitLen)));
    }
}

To use, just call like so...

std::string foo = "This is some string I want to split by spaces.";
std::vector<std::string> results;
split(foo, " ", results);

You can now access all the results in the vector at will. Simple as that - no stringstream, no third-party libraries, no dropping back to C!