I have a .txt
file that has rows of different lengths. Each row is a series point representing one trajectory. Since every trajectory has its own length, the rows are all different in length. That is, the number of columns varies from one row to another.
AFAIK, the genfromtxt()
module in Python requires the numbers of the columns to be the same.
>>> import numpy as np
>>>
>>> data=np.genfromtxt('deer_1995.txt', skip_header=2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\numpy\lib\npyio.py", line 1638, in genfromtxt
raise ValueError(errmsg)
ValueError: Some errors were detected !
Line #4 (got 2352 columns instead of 1824)
Line #5 (got 2182 columns instead of 1824)
Line #6 (got 1412 columns instead of 1824)
Line #7 (got 1650 columns instead of 1824)
Line #8 (got 1688 columns instead of 1824)
Line #9 (got 1500 columns instead of 1824)
Line #10 (got 1208 columns instead of 1824)
It is also able to fill the missing values by the help of filling_values
. However, I think that incurs unnecessary trouble, which I wish to avoid.
So what is the best (Pythonic) way of simply importing this data set in without filling in the "missing values"?
Numpy.genfromtxt does not handle variable-length rows since numpy does only works with arrays and matrices (fixed row/column sizes).
You need to parse your data manually. for example :
The data (csv-based) :
0.613 ; 5.919
0.615 ; 5.349
0.615 ; 5.413
0.617 ; 6.674
0.617 ; 6.616
0.63 ; 7.418
0.642 ; 7.809 ; 5.919
0.648 ; 8.04
0.673 ; 8.789
0.695 ; 9.45
0.712 ; 9.825
0.734 ; 10.265
0.748 ; 10.516
0.764 ; 10.782
0.775 ; 10.979
0.783 ; 11.1
0.808 ; 11.479
0.849 ; 11.951
0.899 ; 12.295
0.951 ; 12.537
0.972 ; 12.675
1.038 ; 12.937
1.098 ; 13.173
1.162 ; 13.464
1.228 ; 13.789
1.294 ; 14.126
1.363 ; 14.518
1.441 ; 14.969
1.545 ; 15.538
1.64 ; 16.071
1.765 ; 16.7
1.904 ; 17.484
2.027 ; 18.36
2.123 ; 19.235
2.149 ; 19.655
2.172 ; 20.096
2.198 ; 20.528
2.221 ; 20.945
2.265 ; 21.352
2.312 ; 21.76
2.365 ; 22.228
2.401 ; 22.836
2.477 ; 23.804
The parser :
import csv
datafile = open('i.csv', 'r')
datareader = csv.reader(datafile)
data = []
for row in datareader:
# I split the input string based on the comma separator, and cast every elements into a float
data.append( [ float(elem) for elem in row[0].split(";") ] )
print data
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With