Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse tsv file with python?

Tags:

python

csv

I have a tsv file which includes some newline data.

111 222 333 "aaa"
444 555 666 "bb
b"

Here b on the third line is a newline character of bb on the second line, so they are one data:

The fourth value of first line:

aaa

The fourth value of second line:

bb
b

If I use Ctrl+C and Ctrl+V paste to a excel file, it works well. But if I want to import the file using python, how to parse?

I have tried:

lines = [line.rstrip() for line in open(file.tsv)]
for i in range(len(lines)):
    value = re.split(r'\t', lines[i]))

But the result was not good:

enter image description here

I want:

enter image description here

like image 520
s_zhang Avatar asked Feb 21 '17 03:02

s_zhang


People also ask

How do I read a TSV file in pandas?

How to read TSV file in pandas? TSV stands for Tab Separated File Use pandas which is a text file where each field is separated by tab (\t). In pandas, you can read the TSV file into DataFrame by using the read_table() function.

How do I convert TSV to CSV?

TSV file can be converted into CSV file by reading one line of data at a time from TSV and replacing tab with comma using re library and writing into CSV file. We first open the TSV file from which we read data and then open the CSV file in which we write data. We read data line by line.


1 Answers

Just use the csv module. It knows about all the possible corner cases in CSV files like new lines in quoted fields. And it can delimit on tabs.

with open("file.tsv") as fd:
    rd = csv.reader(fd, delimiter="\t", quotechar='"')
    for row in rd:
        print(row)

will correctly output:

['111', '222', '333', 'aaa']
['444', '555', '666', 'bb\nb']
like image 161
Serge Ballesta Avatar answered Sep 24 '22 04:09

Serge Ballesta