Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python vs perl sort performance

Solution

This solved all issues with my Perl code (plus extra implementation code.... :-) ) In conlusion both Perl and Python are equally awesome.

use WWW::Curl::Easy;

Thanks to ALL who responded, very much appreciated.

Edit

It appears that the Perl code I am using is spending the majority of its time performing the http get, for example:

my $start_time = gettimeofday;
$request = HTTP::Request->new('GET', 'http://localhost:8080/data.json');
$response = $ua->request($request);
$page = $response->content;
my $end_time = gettimeofday;
print "Time taken @{[ $end_time - $start_time ]} seconds.\n";

The result is:

Time taken 74.2324419021606 seconds.

My python code in comparison:

start = time.time()
r = requests.get('http://localhost:8080/data.json', timeout=120, stream=False)

maxsize = 100000000
content = ''
for chunk in r.iter_content(2048):
    content += chunk
    if len(content) > maxsize:
        r.close()
        raise ValueError('Response too large')

end = time.time()
timetaken = end-start
print timetaken

The result is:

20.3471381664

In both cases the sort times are sub second. So first of all I apologise for the misleading question, and it is another lesson for me to never ever make assumptions.... :-)

I'm not sure what is the best thing to do with this question now. Perhaps someone can propose a better way of performing the request in perl?

End of edit

This is just a quick question regarding sort performance differences in Perl vs Python. This is not a question about which language is better/faster etc, for the record, I first wrote this in perl, noticed the time the sort was taking, and then tried to write the same thing in python to see how fast it would be. I simply want to know, how can I make the perl code perform as fast as the python code?

Lets say we have the following json:

["3434343424335": {
        "key1": 2322,
        "key2": 88232,
        "key3": 83844,
        "key4": 444454,
        "key5": 34343543,
        "key6": 2323232
    },
"78237236343434": {
        "key1": 23676722,
        "key2": 856568232,
        "key3": 838723244,
        "key4": 4434544454,
        "key5": 3432323543,
        "key6": 2323232
    }
]

Lets say we have a list of around 30k-40k records which we want to sort by one of the sub keys. We then want to build a new array of records ordered by the sub key.

Perl - Takes around 27 seconds

my @list;
$decoded = decode_json($page);
foreach my $id (sort {$decoded->{$b}->{key5} <=> $decoded->{$a}->{key5}} keys %{$decoded}) {
    push(@list,{"key"=>$id,"key1"=>$decoded->{$id}{key1}...etc));
}

Python - Takes around 6 seconds

list = []
data = json.loads(content)
data2 = sorted(data, key = lambda x: data[x]['key5'], reverse=True)

for key in data2:
     tmp= {'id':key,'key1':data[key]['key1'],etc.....}
     list.append(tmp)

For the perl code, I have tried using the following tweaks:

use sort '_quicksort';  # use a quicksort algorithm
use sort '_mergesort';  # use a mergesort algorithm
like image 981
John Avatar asked Jul 31 '15 18:07

John


People also ask

Is Perl faster than Python?

As time passes, Python seems to be taking the place of Perl for the same function. Python is faster, streamlined for large amounts of text editing, and capable of powerful functions.

How much faster is Perl than Python?

Perl is about 8 times faster than Python.

Why is Perl used over Python?

Answer: Python and Perl are high-level programming languages that were invented to serve different purposes. Perl was developed by Larry Wall as a Unix-based scripting language for making reports easily. Whereas Python was developed to offer code readability to its users for writing small and large programs.

Is Python older than Perl?

2. Perl was invented by Larry Wall in 1987 while Python by Guido van Rossum in 1989. 3.


1 Answers

Your benchmark is flawed, you're benchmarking multiple variables, not one. It is not just sorting data, but it is also doing JSON decoding, and creating strings, and appending to an array. You can't know how much time is spent sorting and how much is spent doing everything else.

The matter is made worse in that there are several different JSON implementations in Perl each with their own different performance characteristics. Change the underlying JSON library and the benchmark will change again.

If you want to benchmark sort, you'll have to change your benchmark code to eliminate the cost of loading your test data from the benchmark, JSON or not.

Perl and Python have their own internal benchmarking libraries that can benchmark individual functions, but their instrumentation can make them perform far less well than they would in the real world. The performance drag from each benchmarking implementation will be different and might introduce a false bias. These benchmarking libraries are more useful for comparing two functions in the same program. For comparing between languages, keep it simple.

Simplest thing to do to get an accurate benchmark is to time them within the program using the wall clock.

# The current time to the microsecond.
use Time::HiRes qw(gettimeofday);

my @list;
my $decoded = decode_json($page);

my $start_time = gettimeofday;

foreach my $id (sort {$decoded->{$b}->{key5} <=> $decoded->{$a}->{key5}} keys %{$decoded}) {
    push(@list,{"key"=>$id,"key1"=>$decoded->{$id}{key1}...etc));
}

my $end_time = gettimeofday;

print "sort and append took @{[ $end_time - $start_time ]} seconds\n";

(I leave the Python version as an exercise)

From here you can improve your technique. You can use CPU seconds instead of wall clock. The array append and cost of creating the string are still involved in the benchmark, they can be eliminated so you're just benchmarking sort. And so on.

Additionally, you can use a profiler to find out where your programs are spending their time. These have the same raw performance caveats as benchmarking libraries, the results are only useful to find out what percentage of its time a program is using where, but it will prove useful to quickly see if your benchmark has unexpected drag.

The important thing is to benchmark what you think you're benchmarking.

like image 52
Schwern Avatar answered Oct 22 '22 09:10

Schwern