Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sort array of integers lexicographically C++

I want to sort a large array of integers (say 1 millon elements) lexicographically.

Example:

input [] = { 100, 21 , 22 , 99 , 1  , 927 }
sorted[] = { 1  , 100, 21 , 22 , 927, 99  }

I have done it using the simplest possible method:

  • convert all numbers to strings (very costly because it will take huge memory)
  • use std:sort with strcmp as comparison function
  • convert back the strings to integers

Is there a better method than this?

like image 678
Aseem Goyal Avatar asked Oct 25 '13 11:10

Aseem Goyal


People also ask

How do you sort an array lexicographically?

Java Program to Sort Elements in Lexicographical Order (Dictionary Order) Sorting a string array in Lexicographical Order (Dictionary Order) using two approaches: By using any sorting technique to sort array elements. By using sort() function present in Arrays class in util package in java.

How do you sort integers lexicographically?

When applied to numbers, lexicographic order is increasing numerical order, i.e. increasing numerical order (numbers read left to right). For example, the permutations of {1,2,3} in lexicographic order are 123, 132, 213, 231, 312, and 321. When applied to subsets, two subsets are ordered by their smallest elements.

How do you sort lexicographically?

Defining Lexicographical Order Thus, lexicographical order is a way for formalizing word order where the order of the underlying symbols is given. In programming, lexicographical order is popularly known as Dictionary order and is used to sort a string array, compare two strings, or sorting array elements.


2 Answers

Use std::sort() with a suitable comparison function. This cuts down on the memory requirements.

The comparison function can use n % 10, n / 10 % 10, n / 100 % 10 etc. to access the individual digits (for positive integers; negative integers work a bit differently).

like image 133
Oswald Avatar answered Sep 29 '22 17:09

Oswald


To provide any custom sort ordering, you can provide a comparator to std::sort. In this case, it's going to be somewhat complex, using logarithms to inspect individual digits of your number in base 10.

Here's an example — comments inline describe what's going on.

#include <iostream>
#include <algorithm>
#include <cmath>
#include <cassert>

int main() {
    int input[] { 100, 21, 22, 99, 1, 927, -50, -24, -160 };

    /**
     * Sorts the array lexicographically.
     * 
     * The trick is that we have to compare digits left-to-right
     * (considering typical Latin decimal notation) and that each of
     * two numbers to compare may have a different number of digits.
     * 
     * This is very efficient in storage space, but inefficient in
     * execution time; an approach that pre-visits each element and
     * stores a translated representation will at least double your
     * storage requirements (possibly a problem with large inputs)
     * but require only a single translation of each element.
     */
    std::sort(
        std::begin(input),
        std::end(input),
        [](int lhs, int rhs) -> bool {
            // Returns true if lhs < rhs
            // Returns false otherwise
            const auto BASE      = 10;
            const bool LHS_FIRST = true;
            const bool RHS_FIRST = false;
            const bool EQUAL     = false;


            // There's no point in doing anything at all
            // if both inputs are the same; strict-weak
            // ordering requires that we return `false`
            // in this case.
            if (lhs == rhs) {
                return EQUAL;
            }

            // Compensate for sign
            if (lhs < 0 && rhs < 0) {
                // When both are negative, sign on its own yields
                // no clear ordering between the two arguments.
                // 
                // Remove the sign and continue as for positive
                // numbers.
                lhs *= -1;
                rhs *= -1;
            }
            else if (lhs < 0) {
                // When the LHS is negative but the RHS is not,
                // consider the LHS "first" always as we wish to
                // prioritise the leading '-'.
                return LHS_FIRST;
            }
            else if (rhs < 0) {
                // When the RHS is negative but the LHS is not,
                // consider the RHS "first" always as we wish to
                // prioritise the leading '-'.
                return RHS_FIRST;
            }

            // Counting the number of digits in both the LHS and RHS
            // arguments is *almost* trivial.
            const auto lhs_digits = (
                lhs == 0
                ? 1
                : std::ceil(std::log(lhs+1)/std::log(BASE))
            );

            const auto rhs_digits = (
                rhs == 0
                ? 1
                : std::ceil(std::log(rhs+1)/std::log(BASE))
            );

            // Now we loop through the positions, left-to-right,
            // calculating the digit at these positions for each
            // input, and comparing them numerically. The
            // lexicographic nature of the sorting comes from the
            // fact that we are doing this per-digit comparison
            // rather than considering the input value as a whole.
            const auto max_pos = std::max(lhs_digits, rhs_digits);
            for (auto pos = 0; pos < max_pos; pos++) {
                if (lhs_digits - pos == 0) {
                    // Ran out of digits on the LHS;
                    // prioritise the shorter input
                    return LHS_FIRST;
                }
                else if (rhs_digits - pos == 0) {
                    // Ran out of digits on the RHS;
                    // prioritise the shorter input
                    return RHS_FIRST;
                }
                else {
                    const auto lhs_x = (lhs / static_cast<decltype(BASE)>(std::pow(BASE, lhs_digits - 1 - pos))) % BASE;
                    const auto rhs_x = (rhs / static_cast<decltype(BASE)>(std::pow(BASE, rhs_digits - 1 - pos))) % BASE;

                    if (lhs_x < rhs_x)
                        return LHS_FIRST;
                    else if (rhs_x < lhs_x)
                        return RHS_FIRST;
                }
            }

            // If we reached the end and everything still
            // matches up, then something probably went wrong
            // as I'd have expected to catch this in the tests
            // for equality.
            assert("Unknown case encountered");
        }
    );

    std::cout << '{';
    for (auto x : input)
        std::cout << x << ", ";
    std::cout << '}';

    // Output: {-160, -24, -50, 1, 100, 21, 22, 927, 99, }
}

Demo

There are quicker ways to calculate the number of digits in a number, but the above will get you started.

like image 27
Lightness Races in Orbit Avatar answered Sep 29 '22 17:09

Lightness Races in Orbit