Python: How to read a data file with uneven number of columns

Tags:

A friend of mine needs to to read a lot of data (about 18000 data sets) that is all formatted annoyingly. Specifically the data is supposed to be 8 columns and ~ 8000 rows of data, but instead the data is delivered as columns of 7 with the last entry spilling into the first column of the next row.

In addition every ~30 rows there is only 4 columns. This is because some upstream program is reshaping a 200 x 280 array into the 7x8120 array.

My question is this: How can we read the data into a 8x7000 array. My usual arsenal of np.loadtxt and np.genfromtxt fail when there is an uneven number of columns.

Keep in mind that performance is a factor since this has to be done for ~18000 datafiles.

Here is a link to a typical data file: http://users-phys.au.dk/hha07/hk_L1.ref

463

asked Mar 22 '12 13:03

HansHarhoff

1 Answers

An even easier approach I just thought of:

with open("hk_L1.ref") as f:
    data = numpy.array(f.read().split(), dtype=float).reshape(7000, 8)

This reads the data as a one-dimensional array first, completely ignoring all new-line characters, and then we reshape it to the desired shape.

While I think that the task will be I/O-bound anyway, this approach should use little processor time if it matters.

answered Sep 27 '22 16:09

Sven Marnach

Related questions
                            
                                Calling Chrome web browser from the webbrowser.get() in Python
                            
                                Which is a better __repr__ for a custom Python class?
                            
                                Connecting to MS Access 2007 (.accdb) database using pyodbc
                            
                                Audio Recording in Python
                            
                                How to show column headers in a GtkTreeView inside a gtk.ScrolledWindow?
                            
                                Form handling in Pyramid
                            
                                Python logging issues from multiple modules
                            
                                python How to create private class variables using setattr or exec?
                            
                                How to read password with echo "*" in Python console program?
                            
                                skip ending rows containing string while reading a txt file with numpy to generate a numerical array
                            
                                Impossible to set an attribute to a string?
                            
                                OpenCV python's API: FlannBasedMatcher
                            
                                Problems with running cherrypy's hello world example
                            
                                What makes (open) Dylan distinct from other programming languages? [closed]
                            
                                Python OpenCV cv.WaitKey spits back weird output on Ubuntu modulo 256 maps correctly
                            
                                Why use werkzeug when there is flask [closed]
                            
                                pyparsing capturing groups of arbitrary text with given headers as nested lists
                            
                                How to make some filters mandatory in tastypie?
                            
                                creating a temporary table from a query using sqlalchemy orm
                            
                                Getting attributes from arrays of objects in NumPy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python: How to read a data file with uneven number of columns

Tags:

python

file

numpy

HansHarhoff

People also ask

1 Answers

Sven Marnach

Recent Activity

Donate For Us