Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python tabstop-aware len() and padding functions

Python's len() and padding functions like string.ljust() are not tabstop-aware, i.e. they treat '\t' like any other single-width character, and don't round len() up to the nearest multiple of tabstop. Example:

len('Bear\tnecessities\t')

is 17 instead of 24 ( i.e. 4+(8-4)+11+(8-3) )

and say I also want a function pad_with_tabs(s) such that

pad_with_tabs('Bear', 15) = 'Bear\t\t'

Looking for simple implementations of these - compactness and readability first, efficiency second. This is a basic but irritating question. @gnibbler - can you show a purely Pythonic solution, even if it's say 20x less efficient?

Sure you could convert back and forth using str.expandtabs(TABWIDTH), but that's clunky. Importing math to get TABWIDTH * int( math.ceil(len(s)*1.0/TABWIDTH) ) also seems like massive overkill.

I couldn't manage anything more elegant than the following:

TABWIDTH = 8

def pad_with_tabs(s,maxlen):
  s_len = len(s)
  while s_len < maxlen:
    s += '\t'
    s_len += TABWIDTH - (s_len % TABWIDTH)
  return s

and since Python strings are immutable and unless we want to monkey-patch our function into string module to add it as a method, we must also assign to the result of the function:

s = pad_with_tabs(s, ...)

In particular I couldn't get clean approaches using list-comprehension or string.join(...):

''.join([s, '\t' * ntabs])

without special-casing the cases where len(s) is < an integer multiple of TABWIDTH), or len(s)>=maxlen already.

Can anyone show better len() and pad_with_tabs() functions?

like image 836
smci Avatar asked Jan 23 '23 23:01

smci


1 Answers

TABWIDTH=8
def my_len(s):
    return len(s.expandtabs(TABWIDTH))

def pad_with_tabs(s,maxlen):
    return s+"\t"*((maxlen-len(s)-1)/TABWIDTH+1)

Why did I use expandtabs()?
Well it's fast

$ python -m timeit '"Bear\tnecessities\t".expandtabs()'
1000000 loops, best of 3: 0.602 usec per loop
$ python -m timeit 'for c in "Bear\tnecessities\t":pass'
100000 loops, best of 3: 2.32 usec per loop
$ python -m timeit '[c for c in "Bear\tnecessities\t"]'
100000 loops, best of 3: 4.17 usec per loop
$ python -m timeit 'map(None,"Bear\tnecessities\t")'
100000 loops, best of 3: 2.25 usec per loop

Anything that iterates over your string is going to be slower, because just the iteration is ~4 times slower than expandtabs even when you do nothing in the loop.

$ python -m timeit '"Bear\tnecessities\t".split("\t")'
1000000 loops, best of 3: 0.868 usec per loop

Even just splitting on tabs takes longer. You'd still need to iterate over the split and pad each item to the tabstop

like image 161
John La Rooy Avatar answered Feb 01 '23 01:02

John La Rooy