Python vs perl sort performance

Tags:

Solution

This solved all issues with my Perl code (plus extra implementation code.... :-) ) In conlusion both Perl and Python are equally awesome.

use WWW::Curl::Easy;

Thanks to ALL who responded, very much appreciated.

Edit

It appears that the Perl code I am using is spending the majority of its time performing the http get, for example:

my $start_time = gettimeofday;
$request = HTTP::Request->new('GET', 'http://localhost:8080/data.json');
$response = $ua->request($request);
$page = $response->content;
my $end_time = gettimeofday;
print "Time taken @{[ $end_time - $start_time ]} seconds.\n";

The result is:

Time taken 74.2324419021606 seconds.

My python code in comparison:

start = time.time()
r = requests.get('http://localhost:8080/data.json', timeout=120, stream=False)

maxsize = 100000000
content = ''
for chunk in r.iter_content(2048):
    content += chunk
    if len(content) > maxsize:
        r.close()
        raise ValueError('Response too large')

end = time.time()
timetaken = end-start
print timetaken

The result is:

20.3471381664

In both cases the sort times are sub second. So first of all I apologise for the misleading question, and it is another lesson for me to never ever make assumptions.... :-)

I'm not sure what is the best thing to do with this question now. Perhaps someone can propose a better way of performing the request in perl?

End of edit

This is just a quick question regarding sort performance differences in Perl vs Python. This is not a question about which language is better/faster etc, for the record, I first wrote this in perl, noticed the time the sort was taking, and then tried to write the same thing in python to see how fast it would be. I simply want to know, how can I make the perl code perform as fast as the python code?

Lets say we have the following json:

["3434343424335": {
        "key1": 2322,
        "key2": 88232,
        "key3": 83844,
        "key4": 444454,
        "key5": 34343543,
        "key6": 2323232
    },
"78237236343434": {
        "key1": 23676722,
        "key2": 856568232,
        "key3": 838723244,
        "key4": 4434544454,
        "key5": 3432323543,
        "key6": 2323232
    }
]

Lets say we have a list of around 30k-40k records which we want to sort by one of the sub keys. We then want to build a new array of records ordered by the sub key.

Perl - Takes around 27 seconds

my @list;
$decoded = decode_json($page);
foreach my $id (sort {$decoded->{$b}->{key5} <=> $decoded->{$a}->{key5}} keys %{$decoded}) {
    push(@list,{"key"=>$id,"key1"=>$decoded->{$id}{key1}...etc));
}

Python - Takes around 6 seconds

list = []
data = json.loads(content)
data2 = sorted(data, key = lambda x: data[x]['key5'], reverse=True)

for key in data2:
     tmp= {'id':key,'key1':data[key]['key1'],etc.....}
     list.append(tmp)

For the perl code, I have tried using the following tweaks:

use sort '_quicksort';  # use a quicksort algorithm
use sort '_mergesort';  # use a mergesort algorithm

981

asked Jul 31 '15 18:07

John

1 Answers

Your benchmark is flawed, you're benchmarking multiple variables, not one. It is not just sorting data, but it is also doing JSON decoding, and creating strings, and appending to an array. You can't know how much time is spent sorting and how much is spent doing everything else.

The matter is made worse in that there are several different JSON implementations in Perl each with their own different performance characteristics. Change the underlying JSON library and the benchmark will change again.

If you want to benchmark sort, you'll have to change your benchmark code to eliminate the cost of loading your test data from the benchmark, JSON or not.

Perl and Python have their own internal benchmarking libraries that can benchmark individual functions, but their instrumentation can make them perform far less well than they would in the real world. The performance drag from each benchmarking implementation will be different and might introduce a false bias. These benchmarking libraries are more useful for comparing two functions in the same program. For comparing between languages, keep it simple.

Simplest thing to do to get an accurate benchmark is to time them within the program using the wall clock.

# The current time to the microsecond.
use Time::HiRes qw(gettimeofday);

my @list;
my $decoded = decode_json($page);

my $start_time = gettimeofday;

foreach my $id (sort {$decoded->{$b}->{key5} <=> $decoded->{$a}->{key5}} keys %{$decoded}) {
    push(@list,{"key"=>$id,"key1"=>$decoded->{$id}{key1}...etc));
}

my $end_time = gettimeofday;

print "sort and append took @{[ $end_time - $start_time ]} seconds\n";

(I leave the Python version as an exercise)

From here you can improve your technique. You can use CPU seconds instead of wall clock. The array append and cost of creating the string are still involved in the benchmark, they can be eliminated so you're just benchmarking sort. And so on.

Additionally, you can use a profiler to find out where your programs are spending their time. These have the same raw performance caveats as benchmarking libraries, the results are only useful to find out what percentage of its time a program is using where, but it will prove useful to quickly see if your benchmark has unexpected drag.

The important thing is to benchmark what you think you're benchmarking.

answered Oct 22 '22 09:10

Schwern

Related questions
                            
                                Django using locals() [duplicate]
                            
                                Does python have header files like C/C++? [closed]
                            
                                Append Text (Single Letter) to the end of each line in a text file
                            
                                Print current UTC datetime with special format
                            
                                Strip time from an object date in pandas
                            
                                global name 'inf' is not defined
                            
                                How to "remove" mask from numpy array after performing operations?
                            
                                How to remove HTML comments using Regex in Python
                            
                                Stop python script without killing the python process
                            
                                Check whether a list of hexadecimal values is sequential or not
                            
                                Create executable that uses admin rights with Pyinstaller
                            
                                Removing NaNs in numpy arrays
                            
                                Relative Frequency in Python
                            
                                selenium python send_key error: list object has no attribute
                            
                                How can I make link on web page in window using pyqt4?
                            
                                Cosine similarity calculation between two matrices
                            
                                How to check if a number is in a interval
                            
                                How do you create an empty list of tuples?
                            
                                Print a variable selected by a random number
                            
                                Check whether a Kafka topic exists in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python vs perl sort performance

Tags:

performance

python

sorting

perl

John

People also ask

1 Answers

Schwern

Recent Activity

Donate For Us