Just trying to solve this problem:
Given an unsorted array of integers, sum its biggest n numbers.
I did a TypeScript implementation, but of course this is language agnostic:
const list_sum_largest_n_numbers = (list: Array<number> = [], n: number) => {
let sum: number = 0;
const sorted_list = list.sort((a, b) => a - b);
for (let i = 0; i < n; i++) {
let value_to_sum = sorted_list?.pop() || 0;
sum += value_to_sum;
}
return sum;
};
let list = [17, 310, 32_432, 3, 2, 317, 34, 108_379];
let n = 3;
let result = list_sum_largest_n_numbers(list, n);
alert(result); // 141_128
Here in the playground.
My first question is if, when solving this kind of problems, the use of the methods provided by the language —as Array.sort()— are allowed, or if I should implement everything by myself, keeping the solution low level.
Also, I am not sure that this is the best way to solve it. I would say that this is O(n), but I am not taking into account the Array.sort() method I'm using.
Sorting is already O(n logn). It can be an easy solution when making use of the built-in sort functionality, but it is not going to be the most efficient solution to the problem.
If you are looking for a more efficient yet simple solution for this problem, you can just keep picking out (and removing, or marking as visited) the biggest number k times, by iterating. You can sum them together as you go, without having to store them in a new array and iterate again. This would actually be O(n) per number, so O(kn) in total.
This will be more efficient for small or constant k. Of course, if for instance you know that k = O(n), then better approaches exist.
For a slightly less trivial solution, you can use Quickselect to find the kth biggest number x and then in one iteration sum together all numbers greater than x, and then add x and possible remaining duplicate values (or as rici points out, iterate just on the right side of x because quickselect partitions the array to our benefit). This would give you a total time complexity of O(n) addition operations. It's a tradeoff between asymptotic performance and code complexity :)
As to whether it's ok to use built-in functions: Yes, definitely. Whenever they suit your purpose, it is better to use them than to reinvent the wheel every time. However, in an academic or challenge context, there might be rules in place disallowing use of libraries or built-in functions, for educational or competitive reasons.
Given array a[m] and looking for the n greatest:
m <= n, we are done.When m is significantly more than n: cost O(m * log(n)).
Form a binary priority queue p[n]. Insertion is O(log(n)), deletion is O(log(n)), and examining the smallest is O(1).
Iterate a[] and when a[i] is greater than the smallest in p[], remove smallest and add a[i].
Otherwise simple sort a[] with cost O(m * log(m))
Some code to get OP started: (Scant error checking, not optimized.)
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
static void pq_enqueue(size_t current_size, int *pq, int value) {
size_t child = current_size;
while (child > 0) {
size_t parent = (child + 1) / 2 - 1;
if (pq[parent] <= value) {
break;
}
pq[child] = pq[parent];
child = parent;
}
pq[child] = value;
}
static int pq_dequeue(size_t current_size, int *pq) {
int v = pq[0];
size_t root = 0;
for (;;) {
size_t lchild = (root + 1) * 2 - 1;
size_t rchild = (root + 1) * 2 + 1 - 1;
if (lchild >= current_size) {
break;
}
int lvalue = pq[lchild];
int rvalue = rchild < current_size ? pq[rchild] : INT_MAX;
if (lvalue <= rvalue) {
pq[root] = lvalue;
root = lchild;
} else {
pq[root] = rvalue;
root = rchild;
}
}
if (root + 1 < current_size) {
pq[root] = pq[current_size - 1];
size_t child = root;
int value = pq[root];
while (child > 0) {
size_t parent = (child + 1) / 2 - 1;
if (pq[parent] <= value) {
break;
}
pq[child] = pq[parent];
child = parent;
}
pq[child] = value;
}
return v;
}
static clock_t test_pq(size_t n, int buf[n], size_t top_n) {
clock_t c0 = clock();
for (size_t i = 0; i < n; i++) {
buf[i] = rand();
}
int top[top_n];
for (size_t i = 0; i < top_n; i++) {
pq_enqueue(i, top, buf[i]);
}
for (size_t i = top_n; i < n; i++) {
if (buf[i] >= top[0]) {
pq_dequeue(top_n--, top);
pq_enqueue(top_n++, top, buf[i]);
}
}
long long sum = 0;
for (size_t i = 0; i < top_n; i++) {
if (i < 3) {
; // printf("%zu: %d\n", i, top[i]);
}
sum += top[i];
}
clock_t c1 = clock();
printf("n:%11zu, top_n:%5zu, sum: %15lld ticks:%9lld\n", n, top_n, sum,
(long long) (c1 - c0));
return c1 - c0;
}
int main() {
printf("RAND_MAX:%d\n", RAND_MAX);
size_t N = 1000000000;
int *buf = malloc(sizeof *buf * N);
if (buf) {
for (size_t n = 1000000; n <= N; n *= 10) {
test_pq(n, buf, 3);
}
for (size_t t = 4; t <= 1000; t *= 4) {
test_pq(N, buf, t);
}
free(buf);
}
return 0;
}
Output
RAND_MAX:2147483647
n: 1000000, top_n: 3, sum: 6442444824 ticks: 16
n: 10000000, top_n: 3, sum: 6442450334 ticks: 78
n: 100000000, top_n: 3, sum: 6442450883 ticks: 797
n: 1000000000, top_n: 3, sum: 6442450934 ticks: 7891
n: 1000000000, top_n: 4, sum: 8589934585 ticks: 6610
n: 1000000000, top_n: 16, sum: 34359737961 ticks: 6593
n: 1000000000, top_n: 64, sum: 137438948345 ticks: 6610
n: 1000000000, top_n: 256, sum: 549755742014 ticks: 6578
Notice that as the array size grow by 10, the time also grows by 10.
As the top N grows by 4, the time does not grow, as with random data, nearly all values are less than the top few and do not encounter a growing cost. So I guess a poor test case to demo the log(top_n) in the O(n * log(top_n)). When top_n is much less than n and random data, we approach O(n). Hmmm, will review later.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With