In C++, how do I split a string into evenly-sized smaller string?
For example, I have a string "012345678" and want it to split it into 5 smaller strings, and this should return me something like "01", "23", "45", "67", "8".
I'm having trouble of determining the length of the smaller strings. In the previous example, the original string is size 9, and I want to split it into 5 smaller string, so each smaller string except the last one should be length 9 / 5 = 1, but then the last one will be length 9 - 1* 4 = 5, which is unacceptable.
So the formal definition of this problem: the original string is split into EXACTLY n substrings, and no two of the substrings should differ by greater than 1 in length.
My focus is not on C++ syntax or library. It's how to design an algorithm so that the returned string can be nearly-equal in size.
You can split a string by each character using an empty string('') as the splitter. In the example below, we split the same message using an empty string. The result of the split will be an array containing all the characters in the message string.
In C, the strtok() function is used to split a string into a series of tokens based on a particular delimiter. A token is a substring extracted from the original string.
To split a string without removing the delimiter: Use the str. split() method to split the string into a list.
To divide N items into M parts, with lengths within one unit, you can use formula (N*i+N)/M - (N*i)/M
as length of i
'th part, as illustrated below.
#include <string>
#include <iostream>
using namespace std;
int main() {
string text = "abcdefghijklmnopqrstuvwxyz";
int N = text.length();
for (int M=3; M<14; ++M) {
cout <<" length:"<< N <<" parts:"<< M << "\n";
int at, pre=0, i;
for (pre = i = 0; i < M; ++i) {
at = (N+N*i)/M;
cout << "part " << i << "\t" << pre << "\t" << at;
cout << "\t" << text.substr(pre, at-pre) << "\n";
pre = at;
}
}
return 0;
}
For example, when M
is 4 or 5, the code above produces:
length:26 parts:4
part 0 0 6 abcdef
part 1 6 13 ghijklm
part 2 13 19 nopqrs
part 3 19 26 tuvwxyz
length:26 parts:5
part 0 0 5 abcde
part 1 5 10 fghij
part 2 10 15 klmno
part 3 15 20 pqrst
part 4 20 26 uvwxyz
My solution:
std::vector<std::string> split(std::string const & s, size_t count)
{
size_t minsize = s.size()/count;
int extra = s.size() - minsize * count;
std::vector<std::string> tokens;
for(size_t i = 0, offset=0 ; i < count ; ++i, --extra)
{
size_t size = minsize + (extra>0?1:0);
if ( (offset + size) < s.size())
tokens.push_back(s.substr(offset,size));
else
tokens.push_back(s.substr(offset, s.size() - offset));
offset += size;
}
return tokens;
}
Test code:
int main()
{
std::string s;
while (std::cin >> s)
{
std::vector<std::string> tokens = split(s, 5);
//output
std::copy(tokens.begin(), tokens.end(),
std::ostream_iterator<std::string>(std::cout, ", "));
std::cout << std::endl;
}
}
Input:
012345
0123456
01234567
012345678
0123456789
01234567890
Output:
01, 2, 3, 4, 5,
01, 23, 4, 5, 6,
01, 23, 45, 6, 7,
01, 23, 45, 67, 8,
01, 23, 45, 67, 89,
012, 34, 56, 78, 90,
Online Demo : http://ideone.com/gINtK
This solution tends to make the tokens even, i.e all tokens may not be of same size.
It's enough to know the length of substrings;
Assume m is size()
of your string:
int k = (m%n == 0)? n : n-m%n;
Then k
of substrings should be of length m/n
and n-k
of length m/n+1
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With