Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a string at line breaks in python?

I want to copy some tabular data from Excel into a python array. That is, user willselect a range in an Excel table, press "Copy" (CTRL+C) so that the range will be copied to clipboard. Then I will get this clipboard data into a python array (list). I use win32clipboard from pywin32 to get clipboard data into an array:

import win32clipboard

def getClip():
    win32clipboard.OpenClipboard()
    data = win32clipboard.GetClipboardData()
    win32clipboard.CloseClipboard()
    return data

I copy the following range A1:B5 from Excel:

enter image description here

When I use the function above, I get a string like:

'365\t179\r\n96\t-90\r\n48\t-138\r\n12\t-174\r\n30\t-156\r\n'

How to split this string into a list, so that the list will look like:

[(365,179), (96, -90), (48, -138), (12, -174), (30, -156)]

I use split method, but it doesn't give me what I want.

data.split("\n")

['365\t179\r', '96\t-90\r', '48\t-138\r', '12\t-174\r', '30\t-156\r', '']
like image 701
alwbtc Avatar asked Jan 18 '14 13:01

alwbtc


2 Answers

There’s actually a str.splitlines method which will split the string by line breaks, regardless of which line breaks are used. So this works on Unix systems with just an \n, on Windows with \r\n and even on old Mac systems where the line break was just an \r.

>>> s = '365\t179\r\n96\t-90\r\n48\t-138\r\n12\t-174\r\n30\t-156\r\n'
>>> s.splitlines()
['365\t179', '96\t-90', '48\t-138', '12\t-174', '30\t-156']

Once you have this result, you can split by tabs to get the individual cells. So you essentially have to call cell.split('\t') on each cell. This is best done with a list comprehension:

>>> [row.split('\t') for row in s.splitlines()]
[['365', '179'], ['96', '-90'], ['48', '-138'], ['12', '-174'], ['30', '-156']]

As an alternative, you could also use map to apply the splitting operation on each cell:

>>> list(map(lambda cell: cell.split('\t'), s.splitlines()))
[['365', '179'], ['96', '-90'], ['48', '-138'], ['12', '-174'], ['30', '-156']]

As the copied data in the clipboard will always have the rows separated by newlines, and the columns separated by tabs, this solution is also safe to use for any range of cells you copied.

If you further want to convert integers or float to its correct datatypes in Python, I guess you could add some more conversion logic by calling int() on all cells that only have digits in them, float() on all cells that have digits and the dot in them ., leaving the rest as strings:

>>> def convert (cell):
        try:
            return int(cell)
        except ValueError:
            try:
                return float(cell)
            except ValueError:
                return cell
>>> [tuple(map(convert, row.split('\t'))) for row in s.splitlines()]
[(365, 179), (96, -90), (48, -138), (12, -174), (30, -156)]

For a different string:

>>> s = 'Foo\tbar\r\n123.45\t42\r\n-85\t3.14'
>>> [tuple(map(convert, row.split('\t'))) for row in s.splitlines()]
[('Foo', 'bar'), (123.45, 42), (-85, 3.14)]
like image 93
poke Avatar answered Nov 07 '22 08:11

poke


>>> s = '365\t179\r\n96\t-90\r\n48\t-138\r\n12\t-174\r\n30\t-156\r\n'
>>> [map(int, x.split('\t')) for x in s.rstrip().split('\r\n')]
[[365, 179], [96, -90], [48, -138], [12, -174], [30, -156]]

Using the code from my other answer, you can also handle other types as well:

from ast import literal_eval
def solve(x):
    try:
        return literal_eval(x)
    except (ValueError, SyntaxError):
        return x

s = '365\tFoo\r\nBar\t-90.01\r\n48\tspam\r\n12e10\t-174\r\n30\t-156\r\n'
print [map(solve, x.split('\t')) for x in s.rstrip().split('\r\n')]
#[[365, 'Foo'], ['Bar', -90.01], [48, 'spam'], [120000000000.0, -174], [30, -156]]
like image 44
Ashwini Chaudhary Avatar answered Nov 07 '22 08:11

Ashwini Chaudhary