Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cost of calling str() on a string?

Tags:

python

string

What is the cost (if any) of calling the str function on an object that is already a string? The use case here is to normalize an array of objects of different types and convert them into string, naively it can be implemented like so:

def arr_2_strarr(arr):
    return [str(val) for val in arr]

But if the str() causes too much overhead, and my arr contains primarily strings, I may consider using:

def arr_2_strarr2(arr):
    return [str(val) if not isinstance(val, basestring) else val for val in arr]

Any suggestions?

like image 288
benjaminz Avatar asked Jun 08 '17 15:06

benjaminz


People also ask

What does str () do in Python?

The str() function converts values to a string form so they can be combined with other strings.

How do you call the __ str __ method?

Python __str__() This method returns the string representation of the object. This method is called when print() or str() function is invoked on an object. This method must return the String object.

Is str () a method?

The str() method takes three parameters: object - whose string representation is to be returned. encoding - that the given byte object needs to be decoded to (can be UTF-8, ASCII, etc) errors - a response when decoding fails (can be strict, ignore, replace, etc)

What is the purpose of the __ str __ method?

The __str__ method in Python represents the class objects as a string – it can be used for classes. The __str__ method should be defined in a way that is easy to read and outputs all the members of the class. This method is also used as a debugging tool when the members of a class need to be checked.


1 Answers

Calling str on a string object is pretty cheap: it just returns the original string object. Calling isinstance explicitly will definitely be slower.

If you want to test this on real data, take a look at the timeit module.

BTW, you should eliminate the not from your 2nd version

[val if isinstance(val, basestring) else str(val) for val in arr]

And you can speed things up slightly by caching str:

def arr_2_strarr(arr, str=str):
    return [str(val) for val in arr]

Happy micro-optimizing. :)


Why cache str? Well, each time you use a name, Python has to look for it. If you're inside a function, first it looks in the local namespace, and if it can't find the name then it looks in the globals. Even though str is built-in, it still "lives" in the global namespace; it would be inefficient to "import" all the built-ins into every function. By doing

def arr_2_strarr(arr, str=str)

we create a local name str that gets bound to the built-in str type, and because it's a default argument that search & bind process happens once, when the function definition is executed, not each time the function is called.

So each time we call arr_2_strarr the interpreter will immediately find that local str, which will save a tiny amount of time.


Here's some timeit code that compares the various strategies. It runs on both Python 2 & Python 3, although on Python 3 it substitutes str for basestr, since basestr doesn't exist in Python 3.

This code runs the functions on lists of various sizes first with integer data, then with string data which is created by converting the integer data to strings.

Each line of output gives the time to perform the given number of loops over 3 repetitions, sorted from fastest to slowest. As the timeit repeat docs mention, the main number to look at in each run is the smallest one.

The results for all functions on a given list size and type are also sorted from fastest to slowest.

''' Compare the speeds of direct string conversion
    with testing first via isinstance

    See https://stackoverflow.com/q/44439323/4014959

    Written by PM 2Ring 2017.06.09

    Python 2 / 3 compatible
'''

from __future__ import print_function, division
from timeit import Timer
import sys

# Python 3 doesn't have basestring
if sys.version_info[0] > 2:
    basestring = str

# The functions to test
def plain(arr):
    return [str(val) for val in arr]

def cached(arr, str=str):
    return [str(val) for val in arr]

def teststr(arr):
    return [val if isinstance(val, str) else str(val) for val in arr]

def testbase(arr):
    return [val if isinstance(val, basestring) else str(val) for val in arr]

def testbasenot(arr):
    return [str(val) if not isinstance(val, basestring) else val for val in arr]

funcs = (
    plain,
    cached,
    teststr,
    testbase,
    testbasenot,
)

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

def verify(arr):
    results = [func(arr) for func in funcs]
    first, results = results[0], results[1:]
    return all(first == u for u in results)

def time_test(loops, reps):
    ''' Print timing stats for all the functions '''
    timings = []
    for func in funcs:
        fname = func.__name__
        setup = 'from __main__ import arr, ' + fname
        cmd = fname + '(arr)'
        t = Timer(cmd, setup)
        result = t.repeat(reps, loops)
        result.sort()
        timings.append((result, fname))

    timings.sort()
    for result, fname in timings:
        print('{0:12} {1}'.format(fname, result))

# Check that all functions return the same results
if 0:
    print('Testing all functions')
    arr = list(range(10))
    print(arr, verify(arr))
    arr = list('abcdefghij')
    print(arr, verify(arr))

# Do the timing tests
reps = 3
loops = 1 << 16
for i in range(1, 11):
    n = 1 << i
    # Build a data array of integers
    arr = range(n)
    print('\n{0}: Size={1}, Loops={2}'.format(i, n, loops))
    print('* Integer')
    time_test(loops, reps)

    # Convert the data array contents to strings
    arr = cached(arr)
    print('\n* String')
    time_test(loops, reps)
    loops >>= 1    

typical Python 2 output

1: Size=2, Loops=65536
* Integer
cached       [0.17268610000610352, 0.19634914398193359, 0.2058720588684082]
plain        [0.17906594276428223, 0.18797492980957031, 0.24009895324707031]
teststr      [0.32513308525085449, 0.33270597457885742, 0.35080599784851074]
testbasenot  [0.32793092727661133, 0.33176803588867188, 0.33498501777648926]
testbase     [0.32964491844177246, 0.33154511451721191, 0.33760714530944824]

* String
cached       [0.1619560718536377, 0.1628870964050293, 0.16448402404785156]
teststr      [0.16335082054138184, 0.16484308242797852, 0.17012500762939453]
plain        [0.16956901550292969, 0.1711430549621582, 0.18457293510437012]
testbase     [0.22378706932067871, 0.2255101203918457, 0.22593879699707031]
testbasenot  [0.22855901718139648, 0.22941207885742188, 0.23271608352661133]

2: Size=4, Loops=32768
* Integer
cached       [0.12796807289123535, 0.12807202339172363, 0.12817001342773438]
plain        [0.13622713088989258, 0.14297294616699219, 0.14868402481079102]
teststr      [0.27701020240783691, 0.27812099456787109, 0.2795259952545166]
testbasenot  [0.27815794944763184, 0.28220701217651367, 0.29373884201049805]
testbase     [0.2804868221282959, 0.28186416625976562, 0.31699705123901367]

* String
cached       [0.12131500244140625, 0.12241697311401367, 0.13379192352294922]
teststr      [0.12839889526367188, 0.1314079761505127, 0.14053797721862793]
plain        [0.13051795959472656, 0.14696002006530762, 0.18504786491394043]
testbase     [0.18404412269592285, 0.1844489574432373, 0.19633579254150391]
testbasenot  [0.18416285514831543, 0.18494606018066406, 0.18553614616394043]

3: Size=8, Loops=16384
* Integer
cached       [0.10957002639770508, 0.11252093315124512, 0.11768913269042969]
plain        [0.11848998069763184, 0.11958003044128418, 0.1292269229888916]
testbase     [0.26231694221496582, 0.26471304893493652, 0.26625895500183105]
teststr      [0.26410102844238281, 0.2641758918762207, 0.26569199562072754]
testbasenot  [0.26910495758056641, 0.26967120170593262, 0.2741539478302002]

* String
cached       [0.102294921875, 0.10357999801635742, 0.1050269603729248]
teststr      [0.10852217674255371, 0.10861611366271973, 0.1127161979675293]
plain        [0.11173510551452637, 0.11183404922485352, 0.12115597724914551]
testbasenot  [0.16488981246948242, 0.16509699821472168, 0.16648602485656738]
testbase     [0.16622614860534668, 0.16688108444213867, 0.16962814331054688]

4: Size=16, Loops=8192
* Integer
cached       [0.10548806190490723, 0.10568594932556152, 0.10611891746520996]
plain        [0.11526799201965332, 0.1160120964050293, 0.12486004829406738]
teststr      [0.25309896469116211, 0.25549888610839844, 0.25838899612426758]
testbasenot  [0.25410699844360352, 0.27252411842346191, 0.32510590553283691]
testbase     [0.25414609909057617, 0.26968812942504883, 0.27393984794616699]

* String
cached       [0.092885017395019531, 0.096045970916748047, 0.10643196105957031]
teststr      [0.098433017730712891, 0.098783016204833984, 0.10051798820495605]
plain        [0.10081005096435547, 0.10222005844116211, 0.12018895149230957]
testbasenot  [0.15373396873474121, 0.15472292900085449, 0.15676999092102051]
testbase     [0.15490198135375977, 0.15572404861450195, 0.15599799156188965]

5: Size=32, Loops=4096
* Integer
cached       [0.10568094253540039, 0.10743498802185059, 0.1115870475769043]
plain        [0.1163330078125, 0.11633419990539551, 0.12796401977539062]
teststr      [0.25122308731079102, 0.26527810096740723, 0.26579189300537109]
testbase     [0.25309586524963379, 0.25563716888427734, 0.25917816162109375]
testbasenot  [0.25465011596679688, 0.25907588005065918, 0.26110982894897461]

* String
cached       [0.085406064987182617, 0.086378097534179688, 0.08651280403137207]
teststr      [0.092473983764648438, 0.09324193000793457, 0.093439817428588867]
plain        [0.096549034118652344, 0.097501993179321289, 0.10462403297424316]
testbase     [0.14794015884399414, 0.14966106414794922, 0.15016818046569824]
testbasenot  [0.14796280860900879, 0.14940309524536133, 0.15308189392089844]

6: Size=64, Loops=2048
* Integer
cached       [0.10838603973388672, 0.1089630126953125, 0.11129999160766602]
plain        [0.11764693260192871, 0.11851096153259277, 0.12583494186401367]
teststr      [0.2550208568572998, 0.25540995597839355, 0.26316595077514648]
testbase     [0.25723910331726074, 0.25930881500244141, 0.26207089424133301]
testbasenot  [0.25864100456237793, 0.25901007652282715, 0.26875495910644531]

* String
cached       [0.086635112762451172, 0.087384939193725586, 0.099885940551757812]
plain        [0.096493959426879883, 0.12469196319580078, 0.13684391975402832]
teststr      [0.096681118011474609, 0.098448991775512695, 0.10569310188293457]
testbase     [0.14573216438293457, 0.14696693420410156, 0.14700508117675781]
testbasenot  [0.14776277542114258, 0.14852094650268555, 0.15462112426757812]

7: Size=128, Loops=1024
* Integer
cached       [0.10915207862854004, 0.11011981964111328, 0.1127631664276123]
plain        [0.11721491813659668, 0.11830401420593262, 0.1254270076751709]
testbase     [0.25789499282836914, 0.26130795478820801, 0.26179313659667969]
teststr      [0.25840306282043457, 0.25889492034912109, 0.26300287246704102]
testbasenot  [0.26443600654602051, 0.26498103141784668, 0.26691412925720215]

* String
cached       [0.083537101745605469, 0.084954023361206055, 0.086431980133056641]
teststr      [0.091158866882324219, 0.09123992919921875, 0.091590166091918945]
plain        [0.091225862503051758, 0.092115163803100586, 0.099261045455932617]
testbase     [0.14569401741027832, 0.14622306823730469, 0.14650607109069824]
testbasenot  [0.14774990081787109, 0.14930200576782227, 0.15020990371704102]

8: Size=256, Loops=512
* Integer
cached       [0.10824894905090332, 0.10865211486816406, 0.10895800590515137]
plain        [0.11750102043151855, 0.12690877914428711, 0.12890195846557617]
teststr      [0.25457501411437988, 0.25542402267456055, 0.25692200660705566]
testbasenot  [0.25513482093811035, 0.25664496421813965, 0.25999689102172852]
testbase     [0.25680398941040039, 0.25924396514892578, 0.26179695129394531]

* String
cached       [0.080662012100219727, 0.081827878952026367, 0.081900119781494141]
teststr      [0.089673995971679688, 0.097939014434814453, 0.15471792221069336]
plain        [0.094327926635742188, 0.095342159271240234, 0.097375154495239258]
testbasenot  [0.14262199401855469, 0.14278602600097656, 0.14302182197570801]
testbase     [0.14464497566223145, 0.14674210548400879, 0.16207790374755859]

9: Size=512, Loops=256
* Integer
cached       [0.10789299011230469, 0.1092069149017334, 0.110015869140625]
plain        [0.11702799797058105, 0.1181950569152832, 0.12698101997375488]
testbase     [0.25504207611083984, 0.25520896911621094, 0.25734806060791016]
testbasenot  [0.25715017318725586, 0.25747489929199219, 0.25850796699523926]
teststr      [0.25783085823059082, 0.25882315635681152, 0.26154208183288574]

* String
cached       [0.078849077224731445, 0.079813003540039062, 0.084489107131958008]
teststr      [0.086745977401733398, 0.087059974670410156, 0.087485074996948242]
plain        [0.088322877883911133, 0.088804960250854492, 0.097378969192504883]
testbasenot  [0.14128994941711426, 0.14266705513000488, 0.1427910327911377]
testbase     [0.14152097702026367, 0.14231991767883301, 0.14392399787902832]

10: Size=1024, Loops=128
* Integer
cached       [0.10892415046691895, 0.11003899574279785, 0.11008000373840332]
plain        [0.1192779541015625, 0.12048506736755371, 0.12956619262695312]
teststr      [0.25335502624511719, 0.25642204284667969, 0.25892996788024902]
testbase     [0.25525593757629395, 0.25550699234008789, 0.25794696807861328]
testbasenot  [0.25932693481445312, 0.25960803031921387, 0.26134610176086426]

* String
cached       [0.078451156616210938, 0.080369949340820312, 0.080511093139648438]
teststr      [0.084844112396240234, 0.085949897766113281, 0.096578836441040039]
plain        [0.086302042007446289, 0.087638139724731445, 0.096364974975585938]
testbase     [0.14068913459777832, 0.14274501800537109, 0.15559101104736328]
testbasenot  [0.14075493812561035, 0.15553092956542969, 0.19578790664672852]    

typical python3 output

1: Size=2, Loops=65536
* Integer
plain        [0.2957206170030986, 0.2959696320031071, 0.2991539639988332]
cached       [0.3058611470005417, 0.30598287599787, 0.3073535650000849]
testbase     [0.38803433800057974, 0.39307209699836676, 0.393392562000372]
testbasenot  [0.3888578799997049, 0.3951267439988442, 0.42909636100011994]
teststr      [0.41290506400036975, 0.41541150199918775, 0.4488242949992127]

* String
testbase     [0.23906823500146857, 0.23946705200069118, 0.24624350399972172]
testbasenot  [0.24037985899849446, 0.24200722000023234, 0.2462738950016501]
plain        [0.25742501500280923, 0.2644229819998145, 0.26711930600140477]
teststr      [0.2635171010006161, 0.3559218000009423, 0.3784064870014845]
cached       [0.2687887559986848, 0.2711959320004098, 0.38138879500183975]

2: Size=4, Loops=32768
* Integer
cached       [0.21332427200104576, 0.21363574399947538, 0.21528891600246425]
plain        [0.22395663199858973, 0.22762144099760917, 0.23422862100051134]
testbasenot  [0.31939790100295795, 0.32413787499899627, 0.32422161499926005]
testbase     [0.3209382370005187, 0.3213516770010756, 0.3215230670029996]
teststr      [0.3372085839982901, 0.33786465500088525, 0.33847540900023887]

* String
testbasenot  [0.17031173299983493, 0.17143720199965173, 0.17724975699820789]
testbase     [0.170390128998406, 0.17118954800025676, 0.18865150499914307]
cached       [0.18190538799899514, 0.18262020299880533, 0.183105569001782]
plain        [0.18666503399799694, 0.18781541300268145, 0.1955128590016102]
teststr      [0.18973677000030875, 0.19112570400102413, 0.19168143299975782]

3: Size=8, Loops=16384
* Integer
cached       [0.17012267099926248, 0.18160372200145503, 0.2275817529989581]
plain        [0.1890079689983395, 0.1963043950017891, 0.2016476179996971]
testbasenot  [0.28168991999700665, 0.2821743839995179, 0.286649605997809]
testbase     [0.28295213199817226, 0.28760008400058723, 0.2906435440017958]
teststr      [0.2958552290001535, 0.2989299110013235, 0.31747390199961956]

* String
testbase     [0.13354753000021446, 0.13377505199969164, 0.14039257600234123]
cached       [0.1352838150014577, 0.1353432000032626, 0.13798289999976987]
testbasenot  [0.14252334699995117, 0.14301740500013693, 0.1445914210016781]
plain        [0.15130633899752866, 0.15166569000211894, 0.1616801599993778]
teststr      [0.15267008800219628, 0.1545946529986395, 0.15590016200076207]

4: Size=16, Loops=8192
* Integer
cached       [0.144755126999371, 0.14782401300180936, 0.1484048439997423]
plain        [0.1726092749995587, 0.1740606339990336, 0.1815100200001325]
testbase     [0.26685525399807375, 0.27029573199979495, 0.2716258750006091]
testbasenot  [0.2702714350016322, 0.2723204169997189, 0.27288546099953237]
teststr      [0.28515160999813816, 0.28523068700087606, 0.2878553769987775]

* String
cached       [0.11515368599793874, 0.11579233700103941, 0.11688366999806021]
testbase     [0.12178990400207113, 0.13090817400006927, 0.13304468899877975]
testbasenot  [0.13121789299839293, 0.14976675499929115, 0.1521548589989834]
teststr      [0.13410512400150765, 0.1354981399999815, 0.147247362001508]
plain        [0.13691626099898713, 0.1384456069972657, 0.1426525679999031]

5: Size=32, Loops=4096
* Integer
cached       [0.13246865899782279, 0.13320018100057496, 0.134628559997509]
plain        [0.1636957459995756, 0.16763203899972723, 0.1752369269997871]
testbase     [0.26010187700012466, 0.2606812570011243, 0.2647345440018398]
testbasenot  [0.2620696090016281, 0.26230394700178294, 0.26258907899682526]
teststr      [0.27685887300322065, 0.2787095199964824, 0.28293989099984174]

* String
cached       [0.10246079200078384, 0.10416977099885116, 0.10755630499988911]
testbasenot  [0.10829716499938513, 0.10918466699877172, 0.10935586699997657]
testbase     [0.11739019699962228, 0.11808202800239087, 0.11899654000080773]
plain        [0.12601002500014147, 0.12718953500007046, 0.13454839599944535]
teststr      [0.13366336599938222, 0.13407608800116577, 0.13510101700012456]

6: Size=64, Loops=2048
* Integer
cached       [0.12591946799875586, 0.127094235002005, 0.13223557899982552]
plain        [0.160616523000499, 0.16232994500023779, 0.1691623620026803]
testbase     [0.2534341589998803, 0.2556092949998856, 0.2571690379991196]
testbasenot  [0.2560774869998568, 0.2574564010028553, 0.2606996459981019]
teststr      [0.268248238000524, 0.2702014210008201, 0.27107579600124154]

* String
cached       [0.09791737100022146, 0.09819723300097394, 0.10752435399990645]
testbasenot  [0.1057888709983672, 0.10588572099732119, 0.16173565400094958]
testbase     [0.10636284599968349, 0.1179599219976808, 0.12130766799964476]
plain        [0.12285572399923694, 0.12589510299949325, 0.13114397300159908]
teststr      [0.13122114399811835, 0.13273253399893292, 0.14575592999972287]

7: Size=128, Loops=1024
* Integer
cached       [0.12404713899741182, 0.12496110600113752, 0.12496385000122245]
plain        [0.15980284800025402, 0.16046370399999432, 0.16711239899814245]
testbasenot  [0.25531527800194453, 0.25563639699976193, 0.2586420219995489]
testbase     [0.25544935799916857, 0.2558138679996773, 0.257172014000389]
teststr      [0.2699256220003008, 0.2712909309993847, 0.27702098800000385]

* String
cached       [0.09376715399776003, 0.09393715400074143, 0.09975314399707713]
testbasenot  [0.10510071799944853, 0.10511873200084665, 0.10523289399861824]
testbase     [0.11240010600158712, 0.11325187799957348, 0.11632439300228725]
plain        [0.12139380200096639, 0.12202585699924384, 0.1315958569975919]
teststr      [0.12834531499902369, 0.12949470400053542, 0.12955383699954837]

8: Size=256, Loops=512
* Integer
cached       [0.12225364700134378, 0.12283446399669629, 0.1285843859986926]
plain        [0.15971405900199898, 0.16198832800000673, 0.16777605400056927]
testbase     [0.2507534860014857, 0.2527904779999517, 0.25378678199922433]
testbasenot  [0.25323686200135853, 0.2547167230004561, 0.25919888999851537]
teststr      [0.2652072370001406, 0.2658402630004275, 0.2674206650008273]

* String
cached       [0.0906629850032914, 0.0985801380011253, 0.09929232800277532]
testbase     [0.10155730300175492, 0.1042869699995208, 0.11276149599871133]
testbasenot  [0.10197166099897004, 0.11451221999959671, 0.15595895300066331]
plain        [0.11898361400017166, 0.12018223199993372, 0.12760113599870238]
teststr      [0.12645652200080804, 0.12671815700014122, 0.14095144699967932]

9: Size=512, Loops=256
* Integer
cached       [0.12672984500022721, 0.1462409830019169, 0.2653043659993273]
plain        [0.161721200998727, 0.17296033000093303, 0.19699998799842433]
testbase     [0.25432757399903494, 0.25851125400004094, 0.258548003002943]
testbasenot  [0.25619441399976495, 0.25656893900304567, 0.25998359599907417]
teststr      [0.2719232039999042, 0.2744571339972026, 0.2751794379983039]

* String
cached       [0.08841608199873008, 0.08848714099804056, 0.09124958899701596]
testbasenot  [0.09962382599769626, 0.10016373899998143, 0.10028601600060938]
testbase     [0.10713129000214394, 0.10752918499929365, 0.10952026399900205]
plain        [0.1163020489984774, 0.12190789400119684, 0.1264930679972167]
teststr      [0.1242994140011433, 0.12458201900153654, 0.12523995000083232]

10: Size=1024, Loops=128
* Integer
cached       [0.12827690600170172, 0.1294701549995807, 0.13387694999983069]
plain        [0.16636216699771467, 0.16866590399877168, 0.17549873600000865]
testbasenot  [0.25435296399882645, 0.25515673799964134, 0.2605281959986314]
testbase     [0.26351416900070035, 0.26398584699927596, 0.2651360300005763]
teststr      [0.26816077799958293, 0.26908816800278146, 0.2715630999991845]

* String
cached       [0.08827024300262565, 0.09090095799911069, 0.09729095900183893]
testbase     [0.10063145499952952, 0.1010660120009561, 0.10904535399822635]
testbasenot  [0.10313185999984853, 0.11444468399713514, 0.14796407999892836]
plain        [0.11569941500056302, 0.11579339799936861, 0.12615105800068704]
teststr      [0.12353994099976262, 0.12515813500067452, 0.13752399999793852]

These timings were performed on a rather old 32 bit single core 2GHz machine with 2GB of RAM running on a Debian derivative of Linux. I used Python 2.6.6 and Python 3.6.0. Your results may vary. ;) In any case, these results should only be used as a rough guide. timeit does a pretty good job of only timing the stuff we want to time, but it has no control over other processes that also want to use the CPU.

like image 79
PM 2Ring Avatar answered Oct 02 '22 14:10

PM 2Ring