I have a list of file names, with each representing a point in time. The list typically has thousands of elements. Given a time point, I'd like to convert these files names into time objects (I'm using boost::ptime), and then find the value of std::lower_bound of this time point with respect to the files names.
Example:
Filenames (with date + time, minutes increasing, with a minute for every file):
station01_20170612_030405.hdf5
station01_20170612_030505.hdf5
station01_20170612_030605.hdf5
station01_20170612_030705.hdf5
station01_20170612_030805.hdf5
station01_20170612_030905.hdf5
If I have a time-point 2017-06-12 03:06:00, then it fits here:
station01_20170612_030405.hdf5
station01_20170612_030505.hdf5
<--- The lower bound I am looking for is here
station01_20170612_030605.hdf5
station01_20170612_030705.hdf5
station01_20170612_030805.hdf5
station01_20170612_030905.hdf5
So far, everything is simple. Now the problem is that the list of files may be doped with some invalid file name, which will make the conversion to a time point fail.
Currently, I'm doing this the easy/inefficient way, and I'd like to optimize it, because this program will go on a server and the cost of operation matters. So, the stupid way is: Create a new list with time points, and only push time points that are valid:
vector<ptime> filesListTimePoints;
filesListTimePoints.reserve(filesList.size());
ptime time;
for(long i = 0; i < filesList.size(); i++) {
ErrorCode error = ConvertToTime(filesList[i], time);
if(error.errorCode() == SUCCESS)
filesListTimePoints.push_back(time);
}
//now use std::lower_bound() on filesListTimePoints
You see, the problem is that I'm using a linear solution with a problem that can be solved with O(log(N)) complexity. I don't need to convert all files or even look at all of them!
My question: How can I embed this into std::lower_bound, such that it remains with optimal complexity?
On cppreference, there's a basic implementation of std::lower_bound. I'm thinking of modifying that to get a working solution. But I'm not sure what to do when a convesion fails, since this algorithm highly depends on monotonic behavior. Does this problem have a solution, even mathematically speaking?
Here's the version I'm thinking about initially:
template<class ForwardIt, class T>
ForwardIt lower_bound(ForwardIt first, ForwardIt last, const T& value)
{
ForwardIt it;
typename std::iterator_traits<ForwardIt>::difference_type count, step;
count = std::distance(first, last);
while (count > 0) {
it = first;
step = count / 2;
std::advance(it, step);
ErrorCode error = ConvertToTime(*it, time);
if(error.errorCode() == SUCCESS)
{
if (*it < value) {
first = ++it;
count -= step + 1;
}
else
count = step;
}
else {
// skip/ignore this point?
}
}
return first;
}
My ultimate solution (which might sound stupid) is to make this method a mutator of the list, and erase elements that are invalid. Is there a cleaner solution?
You can simply index by optional<ptime>. If you want to cache the converted values, consider making it a multimap<optional<ptime>, File>.
Better yet, make a datatype representing the file, and calculate the timepoint inside its constructor:
struct File {
File(std::string fname) : _fname(std::move(fname)), _time(parse_time(_fname)) { }
boost::optional<boost::posix_time::ptime> _time;
std::string _fname;
static boost::optional<boost::posix_time::ptime> parse_time(std::string const& fname) {
// return ptime or boost::none
}
};
Now, simply define operator< suitably or use e.g. boost::multi_index_container to index by _time
Further notes:
lower_bound, upper_bound and equal_range operations, and will obviously also work with std::lower_bound and friends.filter_iterator adaptor: http://www.boost.org/doc/libs/1_64_0/libs/iterator/doc/filter_iterator.htmlIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With