Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read in a file with a mixture of different delimiters using Python csv module?

Tags:

python

csv

Input:

A    B    C
D    E    F

This file is NOT exclusively tab-delimited, some entries are space-delimited to look like they were tab-delimited (which is annoying). I tried reading in the file with the csv module using the canonical tab delimited option hoping it wouldn't mind a few spaces (needless to say, my output came out botched with this code):

with open('file.txt') as f:
    input = csv.reader(f, delimiter='\t')
    for row in input:
        print row

I then tried replacing the second line with csv.reader('\t'.join(f.split())) to try to take advantage of Remove whitespace in Python using string.whitespace but my error was: AttributeError: 'file' object has no attribute 'split'.

I also tried examining Can I import a CSV file and automatically infer the delimiter? but here the OP imported either semicolon-delimited or comma-delimited files, but not a file which was a random mixture of both kinds of delimiters.

Was wondering if the csv module can handle reading in files with a mix of various delimiters or whether I should try a different approach (e.g., not use the csv module)?

I am hoping that there exists a way to read in a file with a mixture of delimiters and automatically turn this file into a tab-delimited file.

like image 987
warship Avatar asked Aug 22 '14 01:08

warship


1 Answers

Just use .split():

csv='''\
A\tB\tC
D    E    F
'''

data=[]
for line in csv.splitlines():
    data.append(line.split())

print data 
# [['A', 'B', 'C'], ['D', 'E', 'F']]

Or, more succinctly:

>>> [line.split() for line in csv.splitlines()]  
[['A', 'B', 'C'], ['D', 'E', 'F']]

For a file, something like:

with open(fn, 'r') as fin:
    data=[line.split() for line in fin]

It works because str.split() will split on all whitespace between data elements even if more than 1 whitespace character or if mixed:

>>> '1\t\t\t2     3\t  \t  \t4'.split()
['1', '2', '3', '4']
like image 191
dawg Avatar answered Oct 30 '22 17:10

dawg