Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: How to read a data file with uneven number of columns

Tags:

python

file

numpy

A friend of mine needs to to read a lot of data (about 18000 data sets) that is all formatted annoyingly. Specifically the data is supposed to be 8 columns and ~ 8000 rows of data, but instead the data is delivered as columns of 7 with the last entry spilling into the first column of the next row.

In addition every ~30 rows there is only 4 columns. This is because some upstream program is reshaping a 200 x 280 array into the 7x8120 array.

My question is this: How can we read the data into a 8x7000 array. My usual arsenal of np.loadtxt and np.genfromtxt fail when there is an uneven number of columns.

Keep in mind that performance is a factor since this has to be done for ~18000 datafiles.

Here is a link to a typical data file: http://users-phys.au.dk/hha07/hk_L1.ref

like image 463
HansHarhoff Avatar asked Mar 22 '12 13:03

HansHarhoff


People also ask

How to read data files in Python?

data files 1 Testing: Text file .data files may mostly exist as text files, and accessing files in Python is pretty simple. ... 2 Testing: Binary File The .data files could also be in the form of binary files. This means that the way we must access the file also needs to change. ... 3 Using Pandas to read . data files

How do I read a file line by line in Python?

#!/usr/bin/env python. # Define a filename. filename = "bestand.py". # Open the file as f. # The function readlines() reads the file. with open(filename) as f: content = f.read().splitlines() # Show the file contents line by line.

How to store numerical data in NumPy?

There are multiple ways of storing data in files and the above ones are some of the most used formats for storing numerical data. To achieve our required functionality numpy’s loadtxt () function will be used. Syntax: numpy.loadtxt (fname, dtype=’float’, comments=’#’, delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0)

How to work with the data file extension in Python?

Working with the.data file extension is pretty simple and is more or less identifying the way the data is sorted, and then using Python commands to access the file accordingly. What is a.data file?.data files were developed as a means to store data.


1 Answers

An even easier approach I just thought of:

with open("hk_L1.ref") as f:
    data = numpy.array(f.read().split(), dtype=float).reshape(7000, 8)

This reads the data as a one-dimensional array first, completely ignoring all new-line characters, and then we reshape it to the desired shape.

While I think that the task will be I/O-bound anyway, this approach should use little processor time if it matters.

like image 77
Sven Marnach Avatar answered Sep 27 '22 16:09

Sven Marnach