I'm trying to parse a tab-separated file in Python where a number placed k tabs apart from the beginning of a row, should be placed into the k-th array.
Is there a built-in function to do this, or a better way, other than reading line by line and do all the obvious processing a naive solution would perform?
Again, click the File tab in the Ribbon menu and select the Save As option. In the Save As window, select the CSV (Comma delimited) (*. csv) option in the Save as type drop-down menu. Type a name for the CSV file in the File name field, navigate to where you want to save the file, then click the Save button.
You can use the csv
module to parse tab seperated value files easily.
import csv with open("tab-separated-values") as tsv: for line in csv.reader(tsv, dialect="excel-tab"): #You can also use delimiter="\t" rather than giving a dialect. ...
Where line
is a list of the values on the current row for each iteration.
Edit: As suggested below, if you want to read by column, and not by row, then the best thing to do is use the zip()
builtin:
with open("tab-separated-values") as tsv: for column in zip(*[line for line in csv.reader(tsv, dialect="excel-tab")]): ...
I don't think any of the current answers really do what you said you want. (Correction: I now see that @Gareth Latty / @Lattyware has incorporated my answer into his own as an "Edit" near the end.)
Anyway, here's my take:
Say these are the tab-separated values in your input file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
then this:
with open("tab-separated-values.txt") as inp: print( list(zip(*(line.strip().split('\t') for line in inp))) )
would produce the following:
[('1', '6', '11', '16'), ('2', '7', '12', '17'), ('3', '8', '13', '18'), ('4', '9', '14', '19'), ('5', '10', '15', '20')]
As you can see, it put the k-th element of each row into the k-th array.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With