Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I iterate over the words of a string?

Tags:

c++

string

split

People also ask

How can we iterate through individual words in a string JS?

Simple for loop can use for in iterate through words in a string JavaScript. And for iterate words in string use split, map, and join methods (functions).

How do you iterate a list of words in Python?

To iterate through a list in python we can easily use the range() method. This method returns a sequence of items and it can be used to combine a for loop with range() function to iterate over a list.


I use this to split string by a delimiter. The first puts the results in a pre-constructed vector, the second returns a new vector.

#include <string>
#include <sstream>
#include <vector>
#include <iterator>

template <typename Out>
void split(const std::string &s, char delim, Out result) {
    std::istringstream iss(s);
    std::string item;
    while (std::getline(iss, item, delim)) {
        *result++ = item;
    }
}

std::vector<std::string> split(const std::string &s, char delim) {
    std::vector<std::string> elems;
    split(s, delim, std::back_inserter(elems));
    return elems;
}

Note that this solution does not skip empty tokens, so the following will find 4 items, one of which is empty:

std::vector<std::string> x = split("one:two::three", ':');

For what it's worth, here's another way to extract tokens from an input string, relying only on standard library facilities. It's an example of the power and elegance behind the design of the STL.

#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>

int main() {
    using namespace std;
    string sentence = "And I feel fine...";
    istringstream iss(sentence);
    copy(istream_iterator<string>(iss),
         istream_iterator<string>(),
         ostream_iterator<string>(cout, "\n"));
}

Instead of copying the extracted tokens to an output stream, one could insert them into a container, using the same generic copy algorithm.

vector<string> tokens;
copy(istream_iterator<string>(iss),
     istream_iterator<string>(),
     back_inserter(tokens));

... or create the vector directly:

vector<string> tokens{istream_iterator<string>{iss},
                      istream_iterator<string>{}};

A possible solution using Boost might be:

#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("\t "));

This approach might be even faster than the stringstream approach. And since this is a generic template function it can be used to split other types of strings (wchar, etc. or UTF-8) using all kinds of delimiters.

See the documentation for details.


#include <vector>
#include <string>
#include <sstream>

int main()
{
    std::string str("Split me by whitespaces");
    std::string buf;                 // Have a buffer string
    std::stringstream ss(str);       // Insert the string into a stream

    std::vector<std::string> tokens; // Create vector to hold our words

    while (ss >> buf)
        tokens.push_back(buf);

    return 0;
}

For those with whom it does not sit well to sacrifice all efficiency for code size and see "efficient" as a type of elegance, the following should hit a sweet spot (and I think the template container class is an awesomely elegant addition.):

template < class ContainerT >
void tokenize(const std::string& str, ContainerT& tokens,
              const std::string& delimiters = " ", bool trimEmpty = false)
{
   std::string::size_type pos, lastPos = 0, length = str.length();

   using value_type = typename ContainerT::value_type;
   using size_type  = typename ContainerT::size_type;

   while(lastPos < length + 1)
   {
      pos = str.find_first_of(delimiters, lastPos);
      if(pos == std::string::npos)
      {
         pos = length;
      }

      if(pos != lastPos || !trimEmpty)
         tokens.push_back(value_type(str.data()+lastPos,
               (size_type)pos-lastPos ));

      lastPos = pos + 1;
   }
}

I usually choose to use std::vector<std::string> types as my second parameter (ContainerT)... but list<> is way faster than vector<> for when direct access is not needed, and you can even create your own string class and use something like std::list<subString> where subString does not do any copies for incredible speed increases.

It's more than double as fast as the fastest tokenize on this page and almost 5 times faster than some others. Also with the perfect parameter types you can eliminate all string and list copies for additional speed increases.

Additionally it does not do the (extremely inefficient) return of result, but rather it passes the tokens as a reference, thus also allowing you to build up tokens using multiple calls if you so wished.

Lastly it allows you to specify whether to trim empty tokens from the results via a last optional parameter.

All it needs is std::string... the rest are optional. It does not use streams or the boost library, but is flexible enough to be able to accept some of these foreign types naturally.


Here's another solution. It's compact and reasonably efficient:

std::vector<std::string> split(const std::string &text, char sep) {
  std::vector<std::string> tokens;
  std::size_t start = 0, end = 0;
  while ((end = text.find(sep, start)) != std::string::npos) {
    tokens.push_back(text.substr(start, end - start));
    start = end + 1;
  }
  tokens.push_back(text.substr(start));
  return tokens;
}

It can easily be templatised to handle string separators, wide strings, etc.

Note that splitting "" results in a single empty string and splitting "," (ie. sep) results in two empty strings.

It can also be easily expanded to skip empty tokens:

std::vector<std::string> split(const std::string &text, char sep) {
    std::vector<std::string> tokens;
    std::size_t start = 0, end = 0;
    while ((end = text.find(sep, start)) != std::string::npos) {
        if (end != start) {
          tokens.push_back(text.substr(start, end - start));
        }
        start = end + 1;
    }
    if (end != start) {
       tokens.push_back(text.substr(start));
    }
    return tokens;
}

If splitting a string at multiple delimiters while skipping empty tokens is desired, this version may be used:

std::vector<std::string> split(const std::string& text, const std::string& delims)
{
    std::vector<std::string> tokens;
    std::size_t start = text.find_first_not_of(delims), end = 0;

    while((end = text.find_first_of(delims, start)) != std::string::npos)
    {
        tokens.push_back(text.substr(start, end - start));
        start = text.find_first_not_of(delims, end);
    }
    if(start != std::string::npos)
        tokens.push_back(text.substr(start));

    return tokens;
}

This is my favorite way to iterate through a string. You can do whatever you want per word.

string line = "a line of text to iterate through";
string word;

istringstream iss(line, istringstream::in);

while( iss >> word )     
{
    // Do something on `word` here...
}