std::lower_bound with skipping invalid elements

Question

I have a list of file names, with each representing a point in time. The list typically has thousands of elements. Given a time point, I'd like to convert these files names into time objects (I'm using boost::ptime), and then find the value of std::lower_bound of this time point with respect to the files names.

Example:

Filenames (with date + time, minutes increasing, with a minute for every file):

station01_20170612_030405.hdf5
station01_20170612_030505.hdf5
station01_20170612_030605.hdf5
station01_20170612_030705.hdf5
station01_20170612_030805.hdf5
station01_20170612_030905.hdf5

If I have a time-point 2017-06-12 03:06:00, then it fits here:

station01_20170612_030405.hdf5
station01_20170612_030505.hdf5
                             <--- The lower bound I am looking for is here
station01_20170612_030605.hdf5
station01_20170612_030705.hdf5
station01_20170612_030805.hdf5
station01_20170612_030905.hdf5

So far, everything is simple. Now the problem is that the list of files may be doped with some invalid file name, which will make the conversion to a time point fail.

Currently, I'm doing this the easy/inefficient way, and I'd like to optimize it, because this program will go on a server and the cost of operation matters. So, the stupid way is: Create a new list with time points, and only push time points that are valid:

vector<ptime> filesListTimePoints;
filesListTimePoints.reserve(filesList.size());
ptime time;
for(long i = 0; i < filesList.size(); i++) {
    ErrorCode error = ConvertToTime(filesList[i], time);
    if(error.errorCode() == SUCCESS)
        filesListTimePoints.push_back(time);
}
//now use std::lower_bound() on filesListTimePoints

You see, the problem is that I'm using a linear solution with a problem that can be solved with O(log(N)) complexity. I don't need to convert all files or even look at all of them!

My question: How can I embed this into std::lower_bound, such that it remains with optimal complexity?

My idea of a possible solution:

On cppreference, there's a basic implementation of std::lower_bound. I'm thinking of modifying that to get a working solution. But I'm not sure what to do when a convesion fails, since this algorithm highly depends on monotonic behavior. Does this problem have a solution, even mathematically speaking?

Here's the version I'm thinking about initially:

template<class ForwardIt, class T>
ForwardIt lower_bound(ForwardIt first, ForwardIt last, const T& value)
{
    ForwardIt it;
    typename std::iterator_traits<ForwardIt>::difference_type count, step;
    count = std::distance(first, last);

    while (count > 0) {
        it = first; 
        step = count / 2; 
        std::advance(it, step);
        ErrorCode error = ConvertToTime(*it, time);
        if(error.errorCode() == SUCCESS)
        {
            if (*it < value) {
                first = ++it; 
                count -= step + 1; 
            }
            else
                count = step;
            }
        else {
            // skip/ignore this point?
        }
    }
    return first;
}

My ultimate solution (which might sound stupid) is to make this method a mutator of the list, and erase elements that are invalid. Is there a cleaner solution?

sehe · Accepted Answer

You can simply index by optional<ptime>. If you want to cache the converted values, consider making it a multimap<optional<ptime>, File>.

Better yet, make a datatype representing the file, and calculate the timepoint inside its constructor:

struct File {
     File(std::string fname) : _fname(std::move(fname)), _time(parse_time(_fname)) { }

      boost::optional<boost::posix_time::ptime> _time;
      std::string _fname;

      static boost::optional<boost::posix_time::ptime> parse_time(std::string const& fname) {
            // return ptime or boost::none
    }
};

Now, simply define operator< suitably or use e.g. boost::multi_index_container to index by _time

Further notes:

in case it wasn't clear, such a map/set will have it's own lower_bound, upper_bound and equal_range operations, and will obviously also work with std::lower_bound and friends.
there's always filter_iterator adaptor: http://www.boost.org/doc/libs/1_64_0/libs/iterator/doc/filter_iterator.html

std::lower_bound with skipping invalid elements

Tags:

c++

performance

algorithm

time-complexity

boost

My idea of a possible solution:

The Quantum Physicist

1 Answers

sehe

Recent Activity

Donate For Us

std::lower_bound with skipping invalid elements

Tags:

c++

performance

algorithm

time-complexity

boost

My idea of a possible solution:

The Quantum Physicist

1 Answers

sehe

Related questions

Recent Activity

Donate For Us