What is the cost (if any) of calling the str
function on an object that is already a string? The use case here is to normalize an array of objects of different types and convert them into string, naively it can be implemented like so:
def arr_2_strarr(arr):
return [str(val) for val in arr]
But if the str()
causes too much overhead, and my arr
contains primarily strings, I may consider using:
def arr_2_strarr2(arr):
return [str(val) if not isinstance(val, basestring) else val for val in arr]
Any suggestions?
The str() function converts values to a string form so they can be combined with other strings.
Python __str__() This method returns the string representation of the object. This method is called when print() or str() function is invoked on an object. This method must return the String object.
The str() method takes three parameters: object - whose string representation is to be returned. encoding - that the given byte object needs to be decoded to (can be UTF-8, ASCII, etc) errors - a response when decoding fails (can be strict, ignore, replace, etc)
The __str__ method in Python represents the class objects as a string – it can be used for classes. The __str__ method should be defined in a way that is easy to read and outputs all the members of the class. This method is also used as a debugging tool when the members of a class need to be checked.
Calling str
on a string object is pretty cheap: it just returns the original string object. Calling isinstance
explicitly will definitely be slower.
If you want to test this on real data, take a look at the timeit
module.
BTW, you should eliminate the not
from your 2nd version
[val if isinstance(val, basestring) else str(val) for val in arr]
And you can speed things up slightly by caching str
:
def arr_2_strarr(arr, str=str):
return [str(val) for val in arr]
Happy micro-optimizing. :)
Why cache str
? Well, each time you use a name, Python has to look for it. If you're inside a function, first it looks in the local namespace, and if it can't find the name then it looks in the globals. Even though str
is built-in, it still "lives" in the global namespace; it would be inefficient to "import" all the built-ins into every function. By doing
def arr_2_strarr(arr, str=str)
we create a local name str
that gets bound to the built-in str
type, and because it's a default argument that search & bind process happens once, when the function definition is executed, not each time the function is called.
So each time we call arr_2_strarr
the interpreter will immediately find that local str
, which will save a tiny amount of time.
Here's some timeit
code that compares the various strategies. It runs on both Python 2 & Python 3, although on Python 3 it substitutes str
for basestr
, since basestr
doesn't exist in Python 3.
This code runs the functions on lists of various sizes first with integer data, then with string data which is created by converting the integer data to strings.
Each line of output gives the time to perform the given number of loops over 3 repetitions, sorted from fastest to slowest. As the timeit repeat
docs mention, the main number to look at in each run is the smallest one.
The results for all functions on a given list size and type are also sorted from fastest to slowest.
''' Compare the speeds of direct string conversion
with testing first via isinstance
See https://stackoverflow.com/q/44439323/4014959
Written by PM 2Ring 2017.06.09
Python 2 / 3 compatible
'''
from __future__ import print_function, division
from timeit import Timer
import sys
# Python 3 doesn't have basestring
if sys.version_info[0] > 2:
basestring = str
# The functions to test
def plain(arr):
return [str(val) for val in arr]
def cached(arr, str=str):
return [str(val) for val in arr]
def teststr(arr):
return [val if isinstance(val, str) else str(val) for val in arr]
def testbase(arr):
return [val if isinstance(val, basestring) else str(val) for val in arr]
def testbasenot(arr):
return [str(val) if not isinstance(val, basestring) else val for val in arr]
funcs = (
plain,
cached,
teststr,
testbase,
testbasenot,
)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
def verify(arr):
results = [func(arr) for func in funcs]
first, results = results[0], results[1:]
return all(first == u for u in results)
def time_test(loops, reps):
''' Print timing stats for all the functions '''
timings = []
for func in funcs:
fname = func.__name__
setup = 'from __main__ import arr, ' + fname
cmd = fname + '(arr)'
t = Timer(cmd, setup)
result = t.repeat(reps, loops)
result.sort()
timings.append((result, fname))
timings.sort()
for result, fname in timings:
print('{0:12} {1}'.format(fname, result))
# Check that all functions return the same results
if 0:
print('Testing all functions')
arr = list(range(10))
print(arr, verify(arr))
arr = list('abcdefghij')
print(arr, verify(arr))
# Do the timing tests
reps = 3
loops = 1 << 16
for i in range(1, 11):
n = 1 << i
# Build a data array of integers
arr = range(n)
print('\n{0}: Size={1}, Loops={2}'.format(i, n, loops))
print('* Integer')
time_test(loops, reps)
# Convert the data array contents to strings
arr = cached(arr)
print('\n* String')
time_test(loops, reps)
loops >>= 1
typical Python 2 output
1: Size=2, Loops=65536
* Integer
cached [0.17268610000610352, 0.19634914398193359, 0.2058720588684082]
plain [0.17906594276428223, 0.18797492980957031, 0.24009895324707031]
teststr [0.32513308525085449, 0.33270597457885742, 0.35080599784851074]
testbasenot [0.32793092727661133, 0.33176803588867188, 0.33498501777648926]
testbase [0.32964491844177246, 0.33154511451721191, 0.33760714530944824]
* String
cached [0.1619560718536377, 0.1628870964050293, 0.16448402404785156]
teststr [0.16335082054138184, 0.16484308242797852, 0.17012500762939453]
plain [0.16956901550292969, 0.1711430549621582, 0.18457293510437012]
testbase [0.22378706932067871, 0.2255101203918457, 0.22593879699707031]
testbasenot [0.22855901718139648, 0.22941207885742188, 0.23271608352661133]
2: Size=4, Loops=32768
* Integer
cached [0.12796807289123535, 0.12807202339172363, 0.12817001342773438]
plain [0.13622713088989258, 0.14297294616699219, 0.14868402481079102]
teststr [0.27701020240783691, 0.27812099456787109, 0.2795259952545166]
testbasenot [0.27815794944763184, 0.28220701217651367, 0.29373884201049805]
testbase [0.2804868221282959, 0.28186416625976562, 0.31699705123901367]
* String
cached [0.12131500244140625, 0.12241697311401367, 0.13379192352294922]
teststr [0.12839889526367188, 0.1314079761505127, 0.14053797721862793]
plain [0.13051795959472656, 0.14696002006530762, 0.18504786491394043]
testbase [0.18404412269592285, 0.1844489574432373, 0.19633579254150391]
testbasenot [0.18416285514831543, 0.18494606018066406, 0.18553614616394043]
3: Size=8, Loops=16384
* Integer
cached [0.10957002639770508, 0.11252093315124512, 0.11768913269042969]
plain [0.11848998069763184, 0.11958003044128418, 0.1292269229888916]
testbase [0.26231694221496582, 0.26471304893493652, 0.26625895500183105]
teststr [0.26410102844238281, 0.2641758918762207, 0.26569199562072754]
testbasenot [0.26910495758056641, 0.26967120170593262, 0.2741539478302002]
* String
cached [0.102294921875, 0.10357999801635742, 0.1050269603729248]
teststr [0.10852217674255371, 0.10861611366271973, 0.1127161979675293]
plain [0.11173510551452637, 0.11183404922485352, 0.12115597724914551]
testbasenot [0.16488981246948242, 0.16509699821472168, 0.16648602485656738]
testbase [0.16622614860534668, 0.16688108444213867, 0.16962814331054688]
4: Size=16, Loops=8192
* Integer
cached [0.10548806190490723, 0.10568594932556152, 0.10611891746520996]
plain [0.11526799201965332, 0.1160120964050293, 0.12486004829406738]
teststr [0.25309896469116211, 0.25549888610839844, 0.25838899612426758]
testbasenot [0.25410699844360352, 0.27252411842346191, 0.32510590553283691]
testbase [0.25414609909057617, 0.26968812942504883, 0.27393984794616699]
* String
cached [0.092885017395019531, 0.096045970916748047, 0.10643196105957031]
teststr [0.098433017730712891, 0.098783016204833984, 0.10051798820495605]
plain [0.10081005096435547, 0.10222005844116211, 0.12018895149230957]
testbasenot [0.15373396873474121, 0.15472292900085449, 0.15676999092102051]
testbase [0.15490198135375977, 0.15572404861450195, 0.15599799156188965]
5: Size=32, Loops=4096
* Integer
cached [0.10568094253540039, 0.10743498802185059, 0.1115870475769043]
plain [0.1163330078125, 0.11633419990539551, 0.12796401977539062]
teststr [0.25122308731079102, 0.26527810096740723, 0.26579189300537109]
testbase [0.25309586524963379, 0.25563716888427734, 0.25917816162109375]
testbasenot [0.25465011596679688, 0.25907588005065918, 0.26110982894897461]
* String
cached [0.085406064987182617, 0.086378097534179688, 0.08651280403137207]
teststr [0.092473983764648438, 0.09324193000793457, 0.093439817428588867]
plain [0.096549034118652344, 0.097501993179321289, 0.10462403297424316]
testbase [0.14794015884399414, 0.14966106414794922, 0.15016818046569824]
testbasenot [0.14796280860900879, 0.14940309524536133, 0.15308189392089844]
6: Size=64, Loops=2048
* Integer
cached [0.10838603973388672, 0.1089630126953125, 0.11129999160766602]
plain [0.11764693260192871, 0.11851096153259277, 0.12583494186401367]
teststr [0.2550208568572998, 0.25540995597839355, 0.26316595077514648]
testbase [0.25723910331726074, 0.25930881500244141, 0.26207089424133301]
testbasenot [0.25864100456237793, 0.25901007652282715, 0.26875495910644531]
* String
cached [0.086635112762451172, 0.087384939193725586, 0.099885940551757812]
plain [0.096493959426879883, 0.12469196319580078, 0.13684391975402832]
teststr [0.096681118011474609, 0.098448991775512695, 0.10569310188293457]
testbase [0.14573216438293457, 0.14696693420410156, 0.14700508117675781]
testbasenot [0.14776277542114258, 0.14852094650268555, 0.15462112426757812]
7: Size=128, Loops=1024
* Integer
cached [0.10915207862854004, 0.11011981964111328, 0.1127631664276123]
plain [0.11721491813659668, 0.11830401420593262, 0.1254270076751709]
testbase [0.25789499282836914, 0.26130795478820801, 0.26179313659667969]
teststr [0.25840306282043457, 0.25889492034912109, 0.26300287246704102]
testbasenot [0.26443600654602051, 0.26498103141784668, 0.26691412925720215]
* String
cached [0.083537101745605469, 0.084954023361206055, 0.086431980133056641]
teststr [0.091158866882324219, 0.09123992919921875, 0.091590166091918945]
plain [0.091225862503051758, 0.092115163803100586, 0.099261045455932617]
testbase [0.14569401741027832, 0.14622306823730469, 0.14650607109069824]
testbasenot [0.14774990081787109, 0.14930200576782227, 0.15020990371704102]
8: Size=256, Loops=512
* Integer
cached [0.10824894905090332, 0.10865211486816406, 0.10895800590515137]
plain [0.11750102043151855, 0.12690877914428711, 0.12890195846557617]
teststr [0.25457501411437988, 0.25542402267456055, 0.25692200660705566]
testbasenot [0.25513482093811035, 0.25664496421813965, 0.25999689102172852]
testbase [0.25680398941040039, 0.25924396514892578, 0.26179695129394531]
* String
cached [0.080662012100219727, 0.081827878952026367, 0.081900119781494141]
teststr [0.089673995971679688, 0.097939014434814453, 0.15471792221069336]
plain [0.094327926635742188, 0.095342159271240234, 0.097375154495239258]
testbasenot [0.14262199401855469, 0.14278602600097656, 0.14302182197570801]
testbase [0.14464497566223145, 0.14674210548400879, 0.16207790374755859]
9: Size=512, Loops=256
* Integer
cached [0.10789299011230469, 0.1092069149017334, 0.110015869140625]
plain [0.11702799797058105, 0.1181950569152832, 0.12698101997375488]
testbase [0.25504207611083984, 0.25520896911621094, 0.25734806060791016]
testbasenot [0.25715017318725586, 0.25747489929199219, 0.25850796699523926]
teststr [0.25783085823059082, 0.25882315635681152, 0.26154208183288574]
* String
cached [0.078849077224731445, 0.079813003540039062, 0.084489107131958008]
teststr [0.086745977401733398, 0.087059974670410156, 0.087485074996948242]
plain [0.088322877883911133, 0.088804960250854492, 0.097378969192504883]
testbasenot [0.14128994941711426, 0.14266705513000488, 0.1427910327911377]
testbase [0.14152097702026367, 0.14231991767883301, 0.14392399787902832]
10: Size=1024, Loops=128
* Integer
cached [0.10892415046691895, 0.11003899574279785, 0.11008000373840332]
plain [0.1192779541015625, 0.12048506736755371, 0.12956619262695312]
teststr [0.25335502624511719, 0.25642204284667969, 0.25892996788024902]
testbase [0.25525593757629395, 0.25550699234008789, 0.25794696807861328]
testbasenot [0.25932693481445312, 0.25960803031921387, 0.26134610176086426]
* String
cached [0.078451156616210938, 0.080369949340820312, 0.080511093139648438]
teststr [0.084844112396240234, 0.085949897766113281, 0.096578836441040039]
plain [0.086302042007446289, 0.087638139724731445, 0.096364974975585938]
testbase [0.14068913459777832, 0.14274501800537109, 0.15559101104736328]
testbasenot [0.14075493812561035, 0.15553092956542969, 0.19578790664672852]
typical python3 output
1: Size=2, Loops=65536
* Integer
plain [0.2957206170030986, 0.2959696320031071, 0.2991539639988332]
cached [0.3058611470005417, 0.30598287599787, 0.3073535650000849]
testbase [0.38803433800057974, 0.39307209699836676, 0.393392562000372]
testbasenot [0.3888578799997049, 0.3951267439988442, 0.42909636100011994]
teststr [0.41290506400036975, 0.41541150199918775, 0.4488242949992127]
* String
testbase [0.23906823500146857, 0.23946705200069118, 0.24624350399972172]
testbasenot [0.24037985899849446, 0.24200722000023234, 0.2462738950016501]
plain [0.25742501500280923, 0.2644229819998145, 0.26711930600140477]
teststr [0.2635171010006161, 0.3559218000009423, 0.3784064870014845]
cached [0.2687887559986848, 0.2711959320004098, 0.38138879500183975]
2: Size=4, Loops=32768
* Integer
cached [0.21332427200104576, 0.21363574399947538, 0.21528891600246425]
plain [0.22395663199858973, 0.22762144099760917, 0.23422862100051134]
testbasenot [0.31939790100295795, 0.32413787499899627, 0.32422161499926005]
testbase [0.3209382370005187, 0.3213516770010756, 0.3215230670029996]
teststr [0.3372085839982901, 0.33786465500088525, 0.33847540900023887]
* String
testbasenot [0.17031173299983493, 0.17143720199965173, 0.17724975699820789]
testbase [0.170390128998406, 0.17118954800025676, 0.18865150499914307]
cached [0.18190538799899514, 0.18262020299880533, 0.183105569001782]
plain [0.18666503399799694, 0.18781541300268145, 0.1955128590016102]
teststr [0.18973677000030875, 0.19112570400102413, 0.19168143299975782]
3: Size=8, Loops=16384
* Integer
cached [0.17012267099926248, 0.18160372200145503, 0.2275817529989581]
plain [0.1890079689983395, 0.1963043950017891, 0.2016476179996971]
testbasenot [0.28168991999700665, 0.2821743839995179, 0.286649605997809]
testbase [0.28295213199817226, 0.28760008400058723, 0.2906435440017958]
teststr [0.2958552290001535, 0.2989299110013235, 0.31747390199961956]
* String
testbase [0.13354753000021446, 0.13377505199969164, 0.14039257600234123]
cached [0.1352838150014577, 0.1353432000032626, 0.13798289999976987]
testbasenot [0.14252334699995117, 0.14301740500013693, 0.1445914210016781]
plain [0.15130633899752866, 0.15166569000211894, 0.1616801599993778]
teststr [0.15267008800219628, 0.1545946529986395, 0.15590016200076207]
4: Size=16, Loops=8192
* Integer
cached [0.144755126999371, 0.14782401300180936, 0.1484048439997423]
plain [0.1726092749995587, 0.1740606339990336, 0.1815100200001325]
testbase [0.26685525399807375, 0.27029573199979495, 0.2716258750006091]
testbasenot [0.2702714350016322, 0.2723204169997189, 0.27288546099953237]
teststr [0.28515160999813816, 0.28523068700087606, 0.2878553769987775]
* String
cached [0.11515368599793874, 0.11579233700103941, 0.11688366999806021]
testbase [0.12178990400207113, 0.13090817400006927, 0.13304468899877975]
testbasenot [0.13121789299839293, 0.14976675499929115, 0.1521548589989834]
teststr [0.13410512400150765, 0.1354981399999815, 0.147247362001508]
plain [0.13691626099898713, 0.1384456069972657, 0.1426525679999031]
5: Size=32, Loops=4096
* Integer
cached [0.13246865899782279, 0.13320018100057496, 0.134628559997509]
plain [0.1636957459995756, 0.16763203899972723, 0.1752369269997871]
testbase [0.26010187700012466, 0.2606812570011243, 0.2647345440018398]
testbasenot [0.2620696090016281, 0.26230394700178294, 0.26258907899682526]
teststr [0.27685887300322065, 0.2787095199964824, 0.28293989099984174]
* String
cached [0.10246079200078384, 0.10416977099885116, 0.10755630499988911]
testbasenot [0.10829716499938513, 0.10918466699877172, 0.10935586699997657]
testbase [0.11739019699962228, 0.11808202800239087, 0.11899654000080773]
plain [0.12601002500014147, 0.12718953500007046, 0.13454839599944535]
teststr [0.13366336599938222, 0.13407608800116577, 0.13510101700012456]
6: Size=64, Loops=2048
* Integer
cached [0.12591946799875586, 0.127094235002005, 0.13223557899982552]
plain [0.160616523000499, 0.16232994500023779, 0.1691623620026803]
testbase [0.2534341589998803, 0.2556092949998856, 0.2571690379991196]
testbasenot [0.2560774869998568, 0.2574564010028553, 0.2606996459981019]
teststr [0.268248238000524, 0.2702014210008201, 0.27107579600124154]
* String
cached [0.09791737100022146, 0.09819723300097394, 0.10752435399990645]
testbasenot [0.1057888709983672, 0.10588572099732119, 0.16173565400094958]
testbase [0.10636284599968349, 0.1179599219976808, 0.12130766799964476]
plain [0.12285572399923694, 0.12589510299949325, 0.13114397300159908]
teststr [0.13122114399811835, 0.13273253399893292, 0.14575592999972287]
7: Size=128, Loops=1024
* Integer
cached [0.12404713899741182, 0.12496110600113752, 0.12496385000122245]
plain [0.15980284800025402, 0.16046370399999432, 0.16711239899814245]
testbasenot [0.25531527800194453, 0.25563639699976193, 0.2586420219995489]
testbase [0.25544935799916857, 0.2558138679996773, 0.257172014000389]
teststr [0.2699256220003008, 0.2712909309993847, 0.27702098800000385]
* String
cached [0.09376715399776003, 0.09393715400074143, 0.09975314399707713]
testbasenot [0.10510071799944853, 0.10511873200084665, 0.10523289399861824]
testbase [0.11240010600158712, 0.11325187799957348, 0.11632439300228725]
plain [0.12139380200096639, 0.12202585699924384, 0.1315958569975919]
teststr [0.12834531499902369, 0.12949470400053542, 0.12955383699954837]
8: Size=256, Loops=512
* Integer
cached [0.12225364700134378, 0.12283446399669629, 0.1285843859986926]
plain [0.15971405900199898, 0.16198832800000673, 0.16777605400056927]
testbase [0.2507534860014857, 0.2527904779999517, 0.25378678199922433]
testbasenot [0.25323686200135853, 0.2547167230004561, 0.25919888999851537]
teststr [0.2652072370001406, 0.2658402630004275, 0.2674206650008273]
* String
cached [0.0906629850032914, 0.0985801380011253, 0.09929232800277532]
testbase [0.10155730300175492, 0.1042869699995208, 0.11276149599871133]
testbasenot [0.10197166099897004, 0.11451221999959671, 0.15595895300066331]
plain [0.11898361400017166, 0.12018223199993372, 0.12760113599870238]
teststr [0.12645652200080804, 0.12671815700014122, 0.14095144699967932]
9: Size=512, Loops=256
* Integer
cached [0.12672984500022721, 0.1462409830019169, 0.2653043659993273]
plain [0.161721200998727, 0.17296033000093303, 0.19699998799842433]
testbase [0.25432757399903494, 0.25851125400004094, 0.258548003002943]
testbasenot [0.25619441399976495, 0.25656893900304567, 0.25998359599907417]
teststr [0.2719232039999042, 0.2744571339972026, 0.2751794379983039]
* String
cached [0.08841608199873008, 0.08848714099804056, 0.09124958899701596]
testbasenot [0.09962382599769626, 0.10016373899998143, 0.10028601600060938]
testbase [0.10713129000214394, 0.10752918499929365, 0.10952026399900205]
plain [0.1163020489984774, 0.12190789400119684, 0.1264930679972167]
teststr [0.1242994140011433, 0.12458201900153654, 0.12523995000083232]
10: Size=1024, Loops=128
* Integer
cached [0.12827690600170172, 0.1294701549995807, 0.13387694999983069]
plain [0.16636216699771467, 0.16866590399877168, 0.17549873600000865]
testbasenot [0.25435296399882645, 0.25515673799964134, 0.2605281959986314]
testbase [0.26351416900070035, 0.26398584699927596, 0.2651360300005763]
teststr [0.26816077799958293, 0.26908816800278146, 0.2715630999991845]
* String
cached [0.08827024300262565, 0.09090095799911069, 0.09729095900183893]
testbase [0.10063145499952952, 0.1010660120009561, 0.10904535399822635]
testbasenot [0.10313185999984853, 0.11444468399713514, 0.14796407999892836]
plain [0.11569941500056302, 0.11579339799936861, 0.12615105800068704]
teststr [0.12353994099976262, 0.12515813500067452, 0.13752399999793852]
These timings were performed on a rather old 32 bit single core 2GHz machine with 2GB of RAM running on a Debian derivative of Linux. I used Python 2.6.6 and Python 3.6.0. Your results may vary. ;) In any case, these results should only be used as a rough guide. timeit
does a pretty good job of only timing the stuff we want to time, but it has no control over other processes that also want to use the CPU.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With