Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to normalize list of lists of strings in python?

Tags:

python

I have a list of lists that represent a grid of data (think rows in a spreadsheet). Each row can have an arbitrary number of columns, and the data in each cell is a string of arbitrary length.

I want to normalize this to, in effect, make each row have the same number of columns and each column in the data have the same width, padding with spaces as necessary. For example, given the following input:

(
 ("row a", "a1","a2","a3"),
 ("another row", "b1"),
 ("c", "x", "y", "a long string")
)

I want the data to look like this:

(
 ("row a      ", "a1", "a2", "a3           "),
 ("another row", "b1", "  ", "             "),
 ("c          ", "x ", "y ", "a long string")
)

What's the pythonic solution for python 2.6 or greater? Just to be clear: I'm not looking to pretty-print the list per se, I'm looking for a solution that returns a new list of lists (or tuple of tuples) with the values padded out.

like image 572
Bryan Oakley Avatar asked Jan 16 '12 16:01

Bryan Oakley


2 Answers

Starting with your input data:

>>> d = (
 ("row a", "a1","a2","a3"),
 ("another row", "b1"),
 ("c", "x", "y", "a long string")
)

Make one pass to determine the maximum size of each column:

>>> col_size = {}
>>> for row in d:
        for i, col in enumerate(row):
            col_size[i] = max(col_size.get(i, 0), len(col))

>>> ncols = len(col_size)

Then make a second pass to pad each column to the required width:

>>> result = []
>>> for row in d:
        row = list(row) + [''] * (ncols - len(row))
        for i, col in enumerate(row):
            row[i] = col.ljust(col_size[i])
        result.append(row)

That gives the desired result:

>>> from pprint import pprint
>>> pprint(result)
[['row a      ', 'a1', 'a2', 'a3           '],
 ['another row', 'b1', '  ', '             '],
 ['c          ', 'x ', 'y ', 'a long string']]

For convenience, the steps can be combined into a single function:

def align(array):
    col_size = {}
    for row in array:
        for i, col in enumerate(row):
            col_size[i] = max(col_size.get(i, 0), len(col))
    ncols = len(col_size)
    result = []
    for row in array:
        row = list(row) + [''] * (ncols - len(row))
        for i, col in enumerate(row):
            row[i] = col.ljust(col_size[i])
        result.append(row)
    return result
like image 103
Raymond Hettinger Avatar answered Oct 18 '22 01:10

Raymond Hettinger


Here's what I came up with:

import itertools

def pad_rows(strs):
   for col in itertools.izip_longest(*strs, fillvalue=""):
      longest = max(map(len, col))
      yield map(lambda x: x.ljust(longest), col)

def pad_strings(strs):
   return itertools.izip(*pad_rows(strs))

And calling it like this:

print tuple(pad_strings(x))

yields this result:

(('row a      ', 'a1', 'a2', 'a3           '),
 ('another row', 'b1', '  ', '             '),
 ('c          ', 'x ', 'y ', 'a long string'))
like image 32
jterrace Avatar answered Oct 18 '22 02:10

jterrace