Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

parsing a tab-separated file in Python

Tags:

python

io

tabs

I'm trying to parse a tab-separated file in Python where a number placed k tabs apart from the beginning of a row, should be placed into the k-th array.

Is there a built-in function to do this, or a better way, other than reading line by line and do all the obvious processing a naive solution would perform?

like image 205
Bob Avatar asked Jun 15 '12 23:06

Bob


People also ask

How do I convert a tab separated text to CSV?

Again, click the File tab in the Ribbon menu and select the Save As option. In the Save As window, select the CSV (Comma delimited) (*. csv) option in the Save as type drop-down menu. Type a name for the CSV file in the File name field, navigate to where you want to save the file, then click the Save button.


2 Answers

You can use the csv module to parse tab seperated value files easily.

import csv  with open("tab-separated-values") as tsv:     for line in csv.reader(tsv, dialect="excel-tab"): #You can also use delimiter="\t" rather than giving a dialect.         ...  

Where line is a list of the values on the current row for each iteration.

Edit: As suggested below, if you want to read by column, and not by row, then the best thing to do is use the zip() builtin:

with open("tab-separated-values") as tsv:     for column in zip(*[line for line in csv.reader(tsv, dialect="excel-tab")]):         ... 
like image 198
Gareth Latty Avatar answered Sep 21 '22 04:09

Gareth Latty


I don't think any of the current answers really do what you said you want. (Correction: I now see that @Gareth Latty / @Lattyware has incorporated my answer into his own as an "Edit" near the end.)

Anyway, here's my take:

Say these are the tab-separated values in your input file:

1   2   3   4   5 6   7   8   9   10 11  12  13  14  15 16  17  18  19  20 

then this:

with open("tab-separated-values.txt") as inp:     print( list(zip(*(line.strip().split('\t') for line in inp))) ) 

would produce the following:

[('1', '6', '11', '16'),   ('2', '7', '12', '17'),   ('3', '8', '13', '18'),   ('4', '9', '14', '19'),   ('5', '10', '15', '20')] 

As you can see, it put the k-th element of each row into the k-th array.

like image 20
martineau Avatar answered Sep 22 '22 04:09

martineau