Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory-efficient way of computing the median of a large data set? [closed]

If one computer can only hold 1 million numbers, how to find out the median number from 100 million numbers?

like image 796
Stephen Hsu Avatar asked Sep 25 '09 02:09

Stephen Hsu


2 Answers

Reduce the problem to a more difficult one: sort the 100 million numbers using merge sort Then, take the 50 millionth element.

like image 80
Pascal Cuoq Avatar answered Oct 21 '22 03:10

Pascal Cuoq


Do an external sort and then scan once for the median.

Hopefully, the real problem was "how do I do an external sort"? (If this is homework...I want to help in the right way. :-)

like image 42
DigitalRoss Avatar answered Oct 21 '22 04:10

DigitalRoss