Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++: How do I split a string into evenly-sized smaller strings?

In C++, how do I split a string into evenly-sized smaller string?

For example, I have a string "012345678" and want it to split it into 5 smaller strings, and this should return me something like "01", "23", "45", "67", "8".

I'm having trouble of determining the length of the smaller strings. In the previous example, the original string is size 9, and I want to split it into 5 smaller string, so each smaller string except the last one should be length 9 / 5 = 1, but then the last one will be length 9 - 1* 4 = 5, which is unacceptable.

So the formal definition of this problem: the original string is split into EXACTLY n substrings, and no two of the substrings should differ by greater than 1 in length.

My focus is not on C++ syntax or library. It's how to design an algorithm so that the returned string can be nearly-equal in size.

like image 782
Shuo Avatar asked Nov 21 '11 05:11

Shuo


People also ask

How do I split a string into multiple parts?

You can split a string by each character using an empty string('') as the splitter. In the example below, we split the same message using an empty string. The result of the split will be an array containing all the characters in the message string.

Can you split a string in C?

In C, the strtok() function is used to split a string into a series of tokens based on a particular delimiter. A token is a substring extracted from the original string.

How do I split a string without a separator?

To split a string without removing the delimiter: Use the str. split() method to split the string into a list.


3 Answers

To divide N items into M parts, with lengths within one unit, you can use formula (N*i+N)/M - (N*i)/M as length of i'th part, as illustrated below.

 #include <string>
 #include <iostream>
 using namespace std;

 int main() {
   string text = "abcdefghijklmnopqrstuvwxyz";
   int N = text.length();
   for (int M=3; M<14; ++M) {
     cout <<" length:"<< N <<"  parts:"<< M << "\n";
     int at, pre=0, i;
     for (pre = i = 0; i < M; ++i) {
       at = (N+N*i)/M;
       cout << "part " << i << "\t" << pre << "\t" << at;
       cout << "\t" << text.substr(pre, at-pre) << "\n";
       pre = at;
     }
   }
   return 0;
 } 

For example, when M is 4 or 5, the code above produces:

  length:26  parts:4
 part 0 0   6   abcdef
 part 1 6   13  ghijklm
 part 2 13  19  nopqrs
 part 3 19  26  tuvwxyz
  length:26  parts:5
 part 0 0   5   abcde
 part 1 5   10  fghij
 part 2 10  15  klmno
 part 3 15  20  pqrst
 part 4 20  26  uvwxyz
like image 104
James Waldby - jwpat7 Avatar answered Nov 07 '22 13:11

James Waldby - jwpat7


My solution:

std::vector<std::string> split(std::string const & s, size_t count)
{
       size_t minsize = s.size()/count;
       int extra = s.size() - minsize * count;
       std::vector<std::string> tokens;
       for(size_t i = 0, offset=0 ; i < count ; ++i, --extra)
       {
          size_t size = minsize + (extra>0?1:0);
          if ( (offset + size) < s.size())
               tokens.push_back(s.substr(offset,size));
          else
               tokens.push_back(s.substr(offset, s.size() - offset));
          offset += size;
       }       
       return tokens;
}

Test code:

int main() 
{
      std::string s;
      while (std::cin >> s)
      {
        std::vector<std::string> tokens = split(s, 5);
        //output
        std::copy(tokens.begin(), tokens.end(), 
              std::ostream_iterator<std::string>(std::cout, ", "));
        std::cout << std::endl;
      }
}

Input:

012345
0123456
01234567
012345678
0123456789
01234567890

Output:

01, 2, 3, 4, 5, 
01, 23, 4, 5, 6, 
01, 23, 45, 6, 7, 
01, 23, 45, 67, 8, 
01, 23, 45, 67, 89, 
012, 34, 56, 78, 90, 

Online Demo : http://ideone.com/gINtK

This solution tends to make the tokens even, i.e all tokens may not be of same size.

like image 22
Nawaz Avatar answered Nov 07 '22 14:11

Nawaz


It's enough to know the length of substrings;
Assume m is size() of your string:

int k = (m%n == 0)? n : n-m%n;  

Then k of substrings should be of length m/n and n-k of length m/n+1.

like image 1
a-z Avatar answered Nov 07 '22 14:11

a-z