Why is this program faster in Python than Objective-C?

Tags:

I got interested in this small example of an algorithm in Python for looping through a large word list. I am writing a few "tools" that will allow my to slice a Objective-C string or array in a similar fashion as Python.

Specifically, this elegant solution caught my attention for executing very quickly and it uses a string slice as a key element of the algorithm. Try and solve this without a slice!

I have reproduced my local version using the Moby word list below. You can use /usr/share/dict/words if you do not feel like downloading Moby. The source is just a large dictionary-like list of unique words.

#!/usr/bin/env python

count=0
words = set(line.strip() for line in   
           open("/Users/andrew/Downloads/Moby/mwords/354984si.ngl"))
for w in words:
    even, odd = w[::2], w[1::2]
    if even in words and odd in words:
        count+=1

print count

This script will a) be interpreted by Python; b) read the 4.1 MB, 354,983 word Moby dictionary file; c) strip the lines; d) place the lines into a set, and; e) and find all the combinations where the evens and the odds of a given word are also words. This executes in about 0.73 seconds on a MacBook Pro.

I tried to rewrite the same program in Objective-C. I am a beginner at this language, so go easy please, but please do point out the errors.

#import <Foundation/Foundation.h>

NSString *sliceString(NSString *inString, NSUInteger start, NSUInteger stop, 
        NSUInteger step){
    NSUInteger strLength = [inString length];

    if(stop > strLength) {
        stop = strLength;
    }

    if(start > strLength) {
        start = strLength;
    }

    NSUInteger capacity = (stop-start)/step;
    NSMutableString *rtr=[NSMutableString stringWithCapacity:capacity];    

    for(NSUInteger i=start; i < stop; i+=step){
        [rtr appendFormat:@"%c",[inString characterAtIndex:i]];
    }
    return rtr;
}

NSSet * getDictWords(NSString *path){

    NSError *error = nil;
    NSString *words = [[NSString alloc] initWithContentsOfFile:path
                         encoding:NSUTF8StringEncoding error:&error];
    NSCharacterSet *sep=[NSCharacterSet newlineCharacterSet];
    NSPredicate *noEmptyStrings = 
                     [NSPredicate predicateWithFormat:@"SELF != ''"];

    if (words == nil) {
        // deal with error ...
    }
    // ...

    NSArray *temp=[words componentsSeparatedByCharactersInSet:sep];
    NSArray *lines = 
        [temp filteredArrayUsingPredicate:noEmptyStrings];

    NSSet *rtr=[NSSet setWithArray:lines];

    NSLog(@"lines: %lul, word set: %lul",[lines count],[rtr count]);
    [words release];

    return rtr;    
}

int main (int argc, const char * argv[])
{
    NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];

    int count=0;

    NSSet *dict = 
       getDictWords(@"/Users/andrew/Downloads/Moby/mwords/354984si.ngl");

    NSLog(@"Start");

    for(NSString *element in dict){
        NSString *odd_char=sliceString(element, 1,[element length], 2);
        NSString *even_char=sliceString(element, 0, [element length], 2);
        if([dict member:even_char] && [dict member:odd_char]){
            count++;
        }

    }    
    NSLog(@"count=%i",count);

    [pool drain];
    return 0;
}

The Objective-C version produces the same result, (13,341 words), but takes almost 3 seconds to do it. I must be doing something atrociously wrong for a compiled language to be more than 3X slower than a scripted language, but I'll be darned if I can see why.

The basic algorithm is the same: read the lines, strip them, and put them in a set.

My guess of what is slow is the processing of the NSString elements, but I do not know an alternative.

Edit

I edited the Python to be this:

#!/usr/bin/env python
import codecs
count=0
words = set(line.strip() for line in 
     codecs.open("/Users/andrew/Downloads/Moby/mwords/354984si.ngl",
          encoding='utf-8'))
for w in words:
    if w[::2] in words and w[1::2] in words:
        count+=1

print count

For the utf-8 to be on the same plane as the utf-8 NSString. This slowed the Python down to 1.9 secs.

I also switch the slice test to short-circuit type as suggested for both the Python and obj-c version. Now they are close to the same speed. I also tried using C arrays rather than NSStrings, and this was much faster, but not as easy. You also loose utf-8 support doing that.

Python is really cool...

Edit 2

I found a bottleneck that sped things up considerably. Instead of using the [rtr appendFormat:@"%c",[inString characterAtIndex:i]]; method to append a character to the return string, I used this:

for(NSUInteger i=start; i < stop; i+=step){
    buf[0]=[inString characterAtIndex:i];
    [rtr appendString:[NSString stringWithCharacters:buf length:1]];
}

Now I can finally claim that the Objective-C version is faster than the Python version -- but not by much.

739

asked Apr 11 '11 04:04

dawg

1 Answers

Keep in mind that the Python version has been written to move a lot of the heavy lifting down into highly optimised C code when executed on CPython (especially the file input buffering, string slicing and the hash table lookups to check whether even and odd are in words).

That said, you seem to be decoding the file as UTF-8 in your Objective-C code, but leaving the file in binary in your Python code. Using Unicode NSString in the Objective-C version, but 8-bit byte strings in the Python version isn't really a fair comparison - I would expect the performance of the Python version to drop noticeably if you used codecs.open() to open the file with a declared encoding of "utf-8".

You're also making a full second pass to strip the empty lines in your Objective-C, while no such step is present in the Python code.

111

answered Oct 12 '22 18:10

ncoghlan

Related questions
                            
                                Django Admin + FORCE_SCRIPT_NAME + Login redirects incorrectly
                            
                                How do I properly use Python's C API and exceptions?
                            
                                What magic prevents Tkinter programs from blocking in interactive shell?
                            
                                Compiling mysql-python on Windows with PIP
                            
                                Save the "Out[]" table of a pandas dataframe as a figure
                            
                                Timeseries streaming in bokeh
                            
                                UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4'
                            
                                Python: openpyxl how to read a cell font color
                            
                                Alternative for r's Exponential smoothing state space model in python/scikit/numpy
                            
                                Using Deep Learning to Predict Subsequence from Sequence
                            
                                In Python Dictionaries, how does ( (j*5)+1 ) % 2**i cycle through all 2**i
                            
                                Why is a simple 2-layer Neural Network unable to learn 0,0 sequence?
                            
                                How to make a dict key deprecated?
                            
                                Optional Union in type hint
                            
                                How to build an mac os app from a python script having a PySide2 GUI?
                            
                                How to expose std::vector<int> as a Python list using SWIG?
                            
                                Avoiding accidentally catching KeyboardInterrupt and SystemExit in Python 2.4
                            
                                Python -- How do you view output that doesn't fit the screen?
                            
                                What is the purpose of python 2.7's download package "Windows x86 MSI program database"?
                            
                                How can I speed up an animation?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is this program faster in Python than Objective-C?

Tags:

python

objective-c

nsstring

dawg

People also ask

1 Answers

ncoghlan

Recent Activity

Donate For Us