Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

std:sort vs inserting into an std::set

I am reading some line segments from cin. Each line segment is represented by a start and end point. 2D. X and Y.

The input is not sorted. It is in random order. (Update:But I need them sorted first by X and then by Y)

I can read in all the segments, store them in a vector and then call std::sort. On the other hand, I can create an empty std::set and insert each segment as it arrives. The set will automatically maintain sorted order. Which of the two approaches is more efficient?

Update: The total size of the input (number of segments) is known in advance.

like image 498
Agnel Kurian Avatar asked Mar 26 '13 13:03

Agnel Kurian


People also ask

Is std::sort fast?

In general, std::sort is indeed faster than qsort because of a couple of these things: qsort operates on void* , which first requires a dereference, and second requires the size of the data type to perform the swaps. Therefore, the swap operation of qsort is done every byte.

Can you sort std::set?

std::set is an associative container that contains a sorted set of unique objects of type Key . Sorting is done using the key comparison function Compare.

What does std::sort do?

std::sort() is a built-in function in C++'s Standard Template Library. The function takes in a beginning iterator, an ending iterator, and (by default) sorts the iterable in ascending order. The function can also​ be used for custom sorting by passing in a comparator function that returns a boolean.

Why does std :: list need a special sort method?

std::sort is a generic algorithm that's supposed to work for anything that provides iterators. However, it really expects a random access iterator to sort efficiently. std::list , being a linked list, cannot provide random access to its elements efficiently. That's why it provides its own specialized sort algorithm.


3 Answers

You should measure the performance of both approaches to be sure, but it's a safe bet to assume that std::sort on an std::vector is way faster than inserting into an std::set due to locality effects and the large constants hiding in the tree insertion algorithm. Also, the subsequent lookups and iteration will be faster.

(However, std::set is better suited for supporting a mixed series of insertions and deletions/lookups/iterations. Maintaining order in vector is expensive, as each insertion will take linear time on average.)

like image 119
Fred Foo Avatar answered Sep 23 '22 03:09

Fred Foo


As a good rule of thumb, the stricter guarantees are offered, the worse performance you'll get.

Inserting into a std::set guarantees that the sequence is sorted after every insertion.

Inserting into a std::vector, and calling std::sort once after all insertions have been done guarantees that the sequence is sorted once all manipulations on the vector have been done. It doesn't require the vector to be sorted during all the intermediate insertions.

A std::vector also exhibits better spatial locality, and requires fewer memory allocations. So I would assume the vector approach to be faster, but if performance matters to you, then it matters enough to be measured.

If you don't care to measure what is faster in your case for your data sets with your code in your application, then you don't care which is faster.

like image 45
jalf Avatar answered Sep 20 '22 03:09

jalf


Use the container that has the appropriate semantics for your needs. Efficiency generally follows on automatically from that choice.

If you then experience performance bottlenecks, do some benchmarking.

like image 34
Lightness Races in Orbit Avatar answered Sep 21 '22 03:09

Lightness Races in Orbit