Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert tab-delimited txt file into a csv file using Python

Tags:

So I want to convert a simple tab delimited text file into a csv file. If I convert the txt file into a string using string.split('\n') I get a list with each list item as a string with '\t' between each column. I was thinking I could just replace the '\t' with a comma but it won't treat the string within the list like string and allow me to use string.replace. Here is start of my code that still needs a way to parse the tab "\t".

import csv import sys  txt_file = r"mytxt.txt" csv_file = r"mycsv.csv"  in_txt = open(txt_file, "r") out_csv = csv.writer(open(csv_file, 'wb'))  file_string = in_txt.read()  file_list = file_string.split('\n')  for row in ec_file_list:            out_csv.writerow(row) 
like image 352
wilbev Avatar asked Apr 19 '12 01:04

wilbev


2 Answers

csv supports tab delimited files. Supply the delimiter argument to reader:

import csv  txt_file = r"mytxt.txt" csv_file = r"mycsv.csv"  # use 'with' if the program isn't going to immediately terminate # so you don't leave files open # the 'b' is necessary on Windows # it prevents \x1a, Ctrl-z, from ending the stream prematurely # and also stops Python converting to / from different line terminators # On other platforms, it has no effect in_txt = csv.reader(open(txt_file, "rb"), delimiter = '\t') out_csv = csv.writer(open(csv_file, 'wb'))  out_csv.writerows(in_txt) 
like image 52
agf Avatar answered Sep 20 '22 20:09

agf


Why you should always use 'rb' mode when reading files with the csv module:

Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. 

What's in the sample file: any old rubbish, including control characters obtained by extracting blobs or whatever from a database, or injudicious use of the CHAR function in Excel formulas, or ...

>>> open('demo.txt', 'rb').read() 'h1\t"h2a\nh2b"\th3\r\nx1\t"x2a\r\nx2b"\tx3\r\ny1\ty2a\x1ay2b\ty3\r\n' 

Python follows CP/M, MS-DOS, and Windows when it reads files in text mode: \r\n is recognised as the line separator and is served up as \n, and \x1a aka Ctrl-Z is recognised as an END-OF-FILE marker.

>>> open('demo.txt', 'r').read() 'h1\t"h2a\nh2b"\th3\nx1\t"x2a\nx2b"\tx3\ny1\ty2a' # WHOOPS 

csv with a file opened with 'rb' works as expected:

>>> import csv >>> list(csv.reader(open('demo.txt', 'rb'), delimiter='\t')) [['h1', 'h2a\nh2b', 'h3'], ['x1', 'x2a\r\nx2b', 'x3'], ['y1', 'y2a\x1ay2b', 'y3']] 

but text mode doesn't:

>>> list(csv.reader(open('demo.txt', 'r'), delimiter='\t')) [['h1', 'h2a\nh2b', 'h3'], ['x1', 'x2a\nx2b', 'x3'], ['y1', 'y2a']] >>> 
like image 28
John Machin Avatar answered Sep 20 '22 20:09

John Machin