I am trying to parse a series of text files and save them as CSV files using Python (2.7.3). All text files have a 4 line long header which needs to be stripped out. The data lines have various delimiters including " (quote), - (dash), : column, and blank space. I found it a pain to code it in C++ with all these different delimiters, so I decided to try it in Python hearing it is relatively easier to do compared to C/C++. I wrote a piece of code to test it for a single line of data and it works, however, I could not manage to make it work for the actual file. For parsing a single line I was using the text object and "replace" method. It looks like my current implementation reads the text file as a list, and there is no replace method for the list object. Being a novice in Python, I got stuck at this point. Any input would be appreciated! Thanks! <pre class="prettyprint"><code># function for parsing the data def data_parser(text, dic): for i, j in dic.iteritems(): text = text.replace(i,j) return text # open input/output files inputfile = open('test.dat') outputfile = open('test.csv', 'w') my_text = inputfile.readlines()[4:] #reads to whole text file, skipping first 4 lines # sample text string, just for demonstration to let you know how the data looks like # my_text = '"2012-06-23 03:09:13.23",4323584,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,"NAN",-0.3489428,-0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636' # dictionary definition 0-, 1- etc. are there to parse the date block delimited with dashes, and make sure the negative numbers are not effected reps = {'"NAN"':'NAN', '"':'', '0-':'0,','1-':'1,','2-':'2,','3-':'3,','4-':'4,','5-':'5,','6-':'6,','7-':'7,','8-':'8,','9-':'9,', ' ':',', ':':',' } txt = data_parser(my_text, reps) outputfile.writelines(txt) inputfile.close() outputfile.close() </code></pre>

There are a few ways to go about this. One option would be to use <code>inputfile.read()</code> instead of <code>inputfile.readlines()</code> - you'd need to write separate code to strip the first four lines, but if you want the final output as a single string anyway, this might make the most sense. A second, simpler option would be to rejoin the strings after striping the first four lines with <code>my_text = ''.join(my_text)</code>. This is a little inefficient, but if speed isn't a major concern, the code will be simplest. Finally, if you actually want the output as a list of strings instead of a single string, you can just modify your data parser to iterate over the list. That might looks something like this: <pre class="prettyprint"><code>def data_parser(lines, dic): for i, j in dic.iteritems(): for (k, line) in enumerate(lines): lines[k] = line.replace(i, j) return lines </code></pre>

Text File Parsing with Python

Tags:

python

text

file-io

parsing

python-2.7

I am trying to parse a series of text files and save them as CSV files using Python (2.7.3). All text files have a 4 line long header which needs to be stripped out. The data lines have various delimiters including " (quote), - (dash), : column, and blank space. I found it a pain to code it in C++ with all these different delimiters, so I decided to try it in Python hearing it is relatively easier to do compared to C/C++.

I wrote a piece of code to test it for a single line of data and it works, however, I could not manage to make it work for the actual file. For parsing a single line I was using the text object and "replace" method. It looks like my current implementation reads the text file as a list, and there is no replace method for the list object.

Being a novice in Python, I got stuck at this point. Any input would be appreciated!

Thanks!

# function for parsing the data
def data_parser(text, dic):
for i, j in dic.iteritems():
    text = text.replace(i,j)
return text

# open input/output files

inputfile = open('test.dat')
outputfile = open('test.csv', 'w')

my_text = inputfile.readlines()[4:] #reads to whole text file, skipping first 4 lines


# sample text string, just for demonstration to let you know how the data looks like
# my_text = '"2012-06-23 03:09:13.23",4323584,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,"NAN",-0.3489428,-0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636'

# dictionary definition 0-, 1- etc. are there to parse the date block delimited with dashes, and make sure the negative numbers are not effected
reps = {'"NAN"':'NAN', '"':'', '0-':'0,','1-':'1,','2-':'2,','3-':'3,','4-':'4,','5-':'5,','6-':'6,','7-':'7,','8-':'8,','9-':'9,', ' ':',', ':':',' }

txt = data_parser(my_text, reps)
outputfile.writelines(txt)

inputfile.close()
outputfile.close()

375

asked Aug 13 '12 15:08

marillion

3 Answers

I would use a for loop to iterate over the lines in the text file:

for line in my_text:
    outputfile.writelines(data_parser(line, reps))

If you want to read the file line-by-line instead of loading the whole thing at the start of the script you could do something like this:

inputfile = open('test.dat')
outputfile = open('test.csv', 'w')

# sample text string, just for demonstration to let you know how the data looks like
# my_text = '"2012-06-23 03:09:13.23",4323584,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,"NAN",-0.3489428,-0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636'

# dictionary definition 0-, 1- etc. are there to parse the date block delimited with dashes, and make sure the negative numbers are not effected
reps = {'"NAN"':'NAN', '"':'', '0-':'0,','1-':'1,','2-':'2,','3-':'3,','4-':'4,','5-':'5,','6-':'6,','7-':'7,','8-':'8,','9-':'9,', ' ':',', ':':',' }

for i in range(4): inputfile.next() # skip first four lines
for line in inputfile:
    outputfile.writelines(data_parser(line, reps))

inputfile.close()
outputfile.close()

answered Oct 07 '22 05:10

Joe Day

From the accepted answer, it looks like your desired behaviour is to turn

skip 0
skip 1
skip 2
skip 3
"2012-06-23 03:09:13.23",4323584,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,"NAN",-0.3489428,-0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636

into

2012,06,23,03,09,13.23,4323584,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,NAN,-0.3489428,-0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636

If that's right, then I think something like

import csv

with open("test.dat", "rb") as infile, open("test.csv", "wb") as outfile:
    reader = csv.reader(infile)
    writer = csv.writer(outfile, quoting=False)
    for i, line in enumerate(reader):
        if i < 4: continue
        date = line[0].split()
        day = date[0].split('-')
        time = date[1].split(':')
        newline = day + time + line[1:]
        writer.writerow(newline)

would be a little simpler than the reps stuff.

answered Oct 07 '22 06:10

DSM

There are a few ways to go about this. One option would be to use inputfile.read() instead of inputfile.readlines() - you'd need to write separate code to strip the first four lines, but if you want the final output as a single string anyway, this might make the most sense.

A second, simpler option would be to rejoin the strings after striping the first four lines with my_text = ''.join(my_text). This is a little inefficient, but if speed isn't a major concern, the code will be simplest.

Finally, if you actually want the output as a list of strings instead of a single string, you can just modify your data parser to iterate over the list. That might looks something like this:

def data_parser(lines, dic):
    for i, j in dic.iteritems():
        for (k, line) in enumerate(lines):
            lines[k] = line.replace(i, j)
    return lines

answered Oct 07 '22 04:10

Julian

Related questions
                            
                                Django Pandas to http response (download file)
                            
                                How to download google image search results in Python
                            
                                Python: how to specify output folders in Pyinstaller .spec file
                            
                                Is it possible to stream video from https:// (e.g. YouTube) into python with OpenCV?
                            
                                Pandas Melt several groups of columns into multiple target columns by name
                            
                                Exception: Cannot find PyQt5 plugin directories when using Pyinstaller despite PyQt5 not even being used
                            
                                Second Derivative in Python - scipy/numpy/pandas
                            
                                Get selected feature names TFIDF Vectorizer
                            
                                Create multiindex from existing dataframe
                            
                                Python - How to run multiple flask apps from same client machine
                            
                                Convert pandas.groupby to dict
                            
                                How to select all non-black pixels in a NumPy array?
                            
                                How to ignore a logger in the Sentry Python SDK
                            
                                Deploying application with Python or another embedded scripting language
                            
                                What are the viable database abstraction layers for Python
                            
                                In what order does python display dictionary keys? [duplicate]
                            
                                Can I alias expressions inside Python list comprehensions to prevent them being evaluated multiple times?
                            
                                How to convert longitude and latitude to street address
                            
                                how do I get django runserver to show me DeprecationWarnings and other useful messages?
                            
                                parsing C code using python [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With