Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python- Read from Multiple Files

Tags:

python

file

I have 125 data files containing two columns and 21 rows of data. Please see the image below:

enter image description here

and I'd like to import them into a single .csv file (as 250 columns and 21 rows).

I am fairly new to python but this what I have been advised, code wise:

import glob
Results = [open(f) for f in glob.glob("*.data")] 
fout = open("res.csv", 'w')

for row in range(21):
 for f in Results:
  fout.write( f.readline().strip() ) 
  fout.write(',')
 fout.write('\n')
fout.close()

However, there is slight problem with the code as I only get 125 columns, (i.e. the force and displacement columns are written in one column) Please refer to the image below: enter image description here

enter image description here

I'd very much appreciate it if anyone could help me with this !

like image 997
Esan Avatar asked Apr 23 '12 12:04

Esan


4 Answers

import glob
results = [open(f) for f in glob.glob("*.data")]
sep = ","
# Uncomment if your Excel formats decimal numbers like 3,14 instead of 3.14
# sep = ";"

with open("res.csv", 'w') as fout:
    for row in range(21):
        iterator = (f.readline().strip().replace("\t", sep) for f in results)
        line = sep.join(iterator)
        fout.write("{0}\n".format(line))

So to explain what went wrong with your code, your source files use tab as a field separator, but your code uses comma to separate the lines it reads from those files. If your excel uses period as a decimal separator, it uses comma as a default field separator. The whitespace is ignored unless enclosed in quotes, and you see the result.

If you use the text import feature of Excel (Data ribbon => From Text) you can ask it to consider both comma and tab as valid field separators, and then I'm pretty sure your original output would work too.

In contrast, the above code should produce a file that will open correctly when double clicked.

like image 184
Lauritz V. Thaulow Avatar answered Sep 27 '22 21:09

Lauritz V. Thaulow


You don't need to write your own program to do this, in python or otherwise. You can use an existing unix command (if you are in that environment):

paste *.data > res.csv
like image 31
Benedict Avatar answered Sep 27 '22 21:09

Benedict


Try this:

import glob, csv
from itertools import cycle, islice, count

def roundrobin(*iterables):
    "roundrobin('ABC', 'D', 'EF') --> A D E B F C"
    # Recipe credited to George Sakkis
    pending = len(iterables)
    nexts = cycle(iter(it).next for it in iterables)
    while pending:
        try:
            for next in nexts:
                yield next()
        except StopIteration:
            pending -= 1
            nexts = cycle(islice(nexts, pending))

Results = [open(f).readlines() for f in glob.glob("*.data")] 
fout = csv.writer(open("res.csv", 'wb'), dialect="excel")

row = []
for line, c in zip(roundrobin(Results), cycle(range(len(Results)))):
    splitline = line.split()
    for item,currItem in zip(splitline, count(1)):
        row[c+currItem] = item
    if count == len(Results):
        fout.writerow(row)
        row = []
del fout

It should loop over each line of your input file and stitch them together as one row, which the csv library will write in the listed dialect.

like image 29
Spencer Rathbun Avatar answered Sep 27 '22 20:09

Spencer Rathbun


I suggest to get used to csv module. The reason is that if the data is not that simple (simple strings in headings, and then numbers only) it is difficult to implement everything again. Try the following:

import csv
import glob
import os

datapath = './data'
resultpath = './result'
if not os.path.isdir(resultpath):
   os.makedirs(resultpath)

# Initialize the empty rows. It does not check how many rows are
# in the file.
rows = []

# Read data from the files to the above matrix.
for fname in glob.glob(os.path.join(datapath, '*.data')):
    with open(fname, 'rb') as f:
        reader = csv.reader(f)
        for n, row in enumerate(reader):
            if len(rows) < n+1:
                rows.append([])  # add another row
            rows[n].extend(row)  # append the elements from the file

# Write the data from memory to the result file.
fname = os.path.join(resultpath, 'result.csv')
with open(fname, 'wb') as f:
    writer = csv.writer(f)
    for row in rows:
        writer.writerow(row)

The with construct for opening a file can be replaced by the couple:

f = open(fname, 'wb')
...
f.close()

The csv.reader and csv.writer are simply wrappers that parse or compose the line of the file. The doc says that they require to open the file in the binary mode.

like image 43
pepr Avatar answered Sep 27 '22 20:09

pepr