I am using boost::split to parse a data file. The data file contains lines such as the following. data.txt <pre class="prettyprint"><code>1:1~15 ASTKGPSVFPLAPSS SVFPLAPSS -12.6 98.3 </code></pre> The white space between the items are tabs. The code I have to split the above line is as follows. <pre class="prettyprint"><code>std::string buf; /*Assign the line from the file to buf*/ std::vector<std::string> dataLine; boost::split( dataLine, buf , boost::is_any_of("\t "), boost::token_compress_on); //Split data line cout << dataLine.size() << endl; </code></pre> For the above line of code I should get a print out of 5, but I get 6. I have tried to read through the documentation and this solution seems as though it should do what I want, clearly I am missing something. Thanks! Edit: Running a forloop as follows on dataLine you get the following. <pre class="prettyprint"><code>cout << "****" << endl; for(int i = 0 ; i < dataLine.size() ; i ++) cout << dataLine[i] << endl; cout << "****" << endl; **** 1:1~15 ASTKGPSVFPLAPSS SVFPLAPSS -12.6 98.3 **** </code></pre>

Even though "adjacent separators are merged together", it seems like the trailing delimeters make the problem, since even when they are treated as one, it still is one delimeter. So your problem cannot be solved with <code>split()</code> alone. But luckily Boost String Algo has <code>trim()</code> and <code>trim_if()</code>, which strip whitespace or delimeters from beginning and end of a string. So just call <code>trim()</code> on buf, like this: <pre class="prettyprint"><code>std::string buf = "1:1~15 ASTKGPSVFPLAPSS SVFPLAPSS -12.6 98.3 "; std::vector<std::string> dataLine; boost::trim_if(buf, boost::is_any_of("\t ")); // could also use plain boost::trim boost::split(dataLine, buf, boost::is_any_of("\t "), boost::token_compress_on); std::cout << out.size() << std::endl; </code></pre> This question was already asked: boost::split leaves empty tokens at the beginning and end of string - is this desired behaviour?

How to use boost split to split a string and ignore empty values?

Tags:

I am using boost::split to parse a data file. The data file contains lines such as the following.

data.txt

1:1~15  ASTKGPSVFPLAPSS SVFPLAPSS   -12.6   98.3

The white space between the items are tabs. The code I have to split the above line is as follows.

std::string buf;
/*Assign the line from the file to buf*/
std::vector<std::string> dataLine;
boost::split( dataLine, buf , boost::is_any_of("\t "), boost::token_compress_on);       //Split data line
cout << dataLine.size() << endl;

For the above line of code I should get a print out of 5, but I get 6. I have tried to read through the documentation and this solution seems as though it should do what I want, clearly I am missing something. Thanks!

Edit: Running a forloop as follows on dataLine you get the following.

cout << "****" << endl;
for(int i = 0 ; i < dataLine.size() ; i ++) cout << dataLine[i] << endl;
cout << "****" << endl;


****
1:1~15
ASTKGPSVFPLAPSS
SVFPLAPSS
-12.6
98.3

****

917

asked Mar 28 '13 19:03

PhiloEpisteme

2 Answers

Even though "adjacent separators are merged together", it seems like the trailing delimeters make the problem, since even when they are treated as one, it still is one delimeter.

So your problem cannot be solved with split() alone. But luckily Boost String Algo has trim() and trim_if(), which strip whitespace or delimeters from beginning and end of a string. So just call trim() on buf, like this:

std::string buf = "1:1~15  ASTKGPSVFPLAPSS SVFPLAPSS   -12.6   98.3    ";
std::vector<std::string> dataLine;
boost::trim_if(buf, boost::is_any_of("\t ")); // could also use plain boost::trim
boost::split(dataLine, buf, boost::is_any_of("\t "), boost::token_compress_on);
std::cout << out.size() << std::endl;

This question was already asked: boost::split leaves empty tokens at the beginning and end of string - is this desired behaviour?

answered Oct 15 '22 19:10

Oberon

I would recommend using C++ String Toolkit Library. This library is much faster than Boost in my opinion. I used to use Boost to split (aka tokenize) a line of text but found this library to be much more in line with what I want.

One of the great things about strtk::parse is its conversion of tokens into their final value and checking the number of elements.

you could use it as so:

std::vector<std::string> tokens;

// multiple delimiters should be treated as one
if( !strtk::parse( dataLine, "\t", tokens ) )
{
    std::cout << "failed" << std::endl;
}

--- another version

std::string token1;
std::string token2;
std::string token3:
float value1;
float value2;

if( !strtk::parse( dataLine, "\t", token1, token2, token3, value1, value2) )
{
     std::cout << "failed" << std::endl;
     // fails if the number of elements is not what you want
}

Online documentation for the library: String Tokenizer Documentation Link to the source code: C++ String Toolkit Library

answered Oct 15 '22 20:10

DannyK

Related questions
                            
                                Reference custom resource defined in another xaml file
                            
                                Asynchronous or Synchronous calling of event handlers in javascript
                            
                                ASP.NET and C# Redirect
                            
                                How to create a simple sysfs class attribute in Linux kernel v3.2
                            
                                Mailx and Gmail nss config dir [closed]
                            
                                SSE integer division?
                            
                                How do I configure NPM to Trust the firewall issuer for HTTPS proxying?
                            
                                How to build an application with TypeScript? [closed]
                            
                                Getting multiple results from map with "lens"
                            
                                Python multiprocessing: is it possible to have a pool inside of a pool?
                            
                                Split a string (stored in a variable) into multiple words using spaces but not the spaces within double quotes
                            
                                R: how does a foreach loop find a function that should be invoked?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With