I have a tsv file which includes some newline data.
111 222 333 "aaa"
444 555 666 "bb
b"
Here b
on the third line is a newline character of bb
on the second line, so they are one data:
The fourth value of first line:
aaa
The fourth value of second line:
bb
b
If I use Ctrl+C and Ctrl+V paste to a excel file, it works well. But if I want to import the file using python, how to parse?
I have tried:
lines = [line.rstrip() for line in open(file.tsv)]
for i in range(len(lines)):
value = re.split(r'\t', lines[i]))
But the result was not good:
I want:
How to read TSV file in pandas? TSV stands for Tab Separated File Use pandas which is a text file where each field is separated by tab (\t). In pandas, you can read the TSV file into DataFrame by using the read_table() function.
TSV file can be converted into CSV file by reading one line of data at a time from TSV and replacing tab with comma using re library and writing into CSV file. We first open the TSV file from which we read data and then open the CSV file in which we write data. We read data line by line.
Just use the csv module. It knows about all the possible corner cases in CSV files like new lines in quoted fields. And it can delimit on tabs.
with open("file.tsv") as fd:
rd = csv.reader(fd, delimiter="\t", quotechar='"')
for row in rd:
print(row)
will correctly output:
['111', '222', '333', 'aaa']
['444', '555', '666', 'bb\nb']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With