A friend of mine needs to to read a lot of data (about 18000 data sets) that is all formatted annoyingly. Specifically the data is supposed to be 8 columns and ~ 8000 rows of data, but instead the data is delivered as columns of 7 with the last entry spilling into the first column of the next row.
In addition every ~30 rows there is only 4 columns. This is because some upstream program is reshaping a 200 x 280 array into the 7x8120 array.
My question is this: How can we read the data into a 8x7000 array. My usual arsenal of np.loadtxt and np.genfromtxt fail when there is an uneven number of columns.
Keep in mind that performance is a factor since this has to be done for ~18000 datafiles.
Here is a link to a typical data file: http://users-phys.au.dk/hha07/hk_L1.ref
data files 1 Testing: Text file .data files may mostly exist as text files, and accessing files in Python is pretty simple. ... 2 Testing: Binary File The .data files could also be in the form of binary files. This means that the way we must access the file also needs to change. ... 3 Using Pandas to read . data files
#!/usr/bin/env python. # Define a filename. filename = "bestand.py". # Open the file as f. # The function readlines() reads the file. with open(filename) as f: content = f.read().splitlines() # Show the file contents line by line.
There are multiple ways of storing data in files and the above ones are some of the most used formats for storing numerical data. To achieve our required functionality numpy’s loadtxt () function will be used. Syntax: numpy.loadtxt (fname, dtype=’float’, comments=’#’, delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0)
Working with the.data file extension is pretty simple and is more or less identifying the way the data is sorted, and then using Python commands to access the file accordingly. What is a.data file?.data files were developed as a means to store data.
An even easier approach I just thought of:
with open("hk_L1.ref") as f:
data = numpy.array(f.read().split(), dtype=float).reshape(7000, 8)
This reads the data as a one-dimensional array first, completely ignoring all new-line characters, and then we reshape it to the desired shape.
While I think that the task will be I/O-bound anyway, this approach should use little processor time if it matters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With