I have a data set that I would like to store and be able to load in Octave
18.0 8 307.0 130.0 3504. 12.0 70 1 "chevrolet chevelle malibu"
15.0 8 350.0 165.0 3693. 11.5 70 1 "buick skylark 320"
18.0 8 318.0 150.0 3436. 11.0 70 1 "plymouth satellite"
16.0 8 304.0 150.0 3433. 12.0 70 1 "amc rebel sst"
17.0 8 302.0 140.0 3449. 10.5 70 1 "ford torino"
15.0 8 429.0 198.0 4341. 10.0 70 1 "ford galaxie 500"
14.0 8 454.0 220.0 4354. 9.0 70 1 "chevrolet impala"
14.0 8 440.0 215.0 4312. 8.5 70 1 "plymouth fury iii"
14.0 8 455.0 225.0 4425. 10.0 70 1 "pontiac catalina"
15.0 8 390.0 190.0 3850. 8.5 70 1 "amc ambassador dpl"
It does not work immediately when I try to use:
data = load('auto.txt')
Is there a way to load from a text files with the given format or do I need to convert it to e.g
18.0,8,307.0,130.0,3504.0,12.0,70,1
...
EDIT: Deleting the last row and fixing the 'half' number e.g. 3504. -> 3504.0 and then used:
data = load('-ascii','autocleaned.txt');
Loaded the data as wanted in to a matrix in Octave.
load
is usually meant for loading octave and matlab binary files but can be used for loading textual data like yours. You can load your data using the "-ascii"
option but you'd have to reformat your file slightly before putting it into load
even with the "-ascii"
option enabled. Use a consistent column separator ie. just a tab or a comma, use full numbers not 3850.
and don't use strings.
Then you can do something like this to get it to work
DATA = load("-ascii", "auto.txt");
If the final string field is removed from each line, the file can be read with:
filename='stack25148040_1.txt'
fid = fopen(filename, 'r');
[x, count] = fscanf(fid, '%f', [10, Inf])
endif
fclose(fid);
Alternatively the whole file could read in as one column and reshaped.
I haven't figured out how to read both the numeric fields and the string field. For that I've had to fall back on Python with more general purpose file reading tools.
Here is a Python script that reads the file, creates a numpy
structured array, writes that to a .mat
file, which Octave
can then read:
import csv
import numpy as np
data=[]
with open('stack25148040.txt','rb') as f:
r = csv.reader(f, delimiter=' ')
# csv handles quoted strings with white space
for l in r:
# remove empty strings from the split on ' '
data.append([x for x in l if x])
print data[0]
for dd in data:
# convert 8 of the strings (per line) to float
dd[:]=[float(d) for d in dd[:8]]+dd[-1:]
data=data[:-1] # remove empty last line
print data[0]
print
# make a structured array, with numbers and a string
dt=np.dtype("f8,i4,f8,f8,f8,f8,i4,i4,|S25")
A=np.array([tuple(d) for d in data],dtype=dt)
print A
from scipy.io import savemat
savemat('stack25148040.mat',{'A':A})
In Octave
this could read with
load stack25148040.mat
A
# A = 1x10 struct array containing the fields:
# f0 f1 ... f8
A.f8 # string field
A(1) # 1st row
# scalar structure containing the fields:
# f0 = 18
# f1 = 8
...
# f8 = chevrolet chevelle malibu
Newer Octave (3.8) has an importdata
function. It handles the original data file without any extra arguments. It returns a structure with 2 fields
x.data
is a (10,11)
matrix. x.data(:,1:8)
is the desire numerical data. x.data(:,9:11)
is a mix of NA
and random numbers. The NA
stand in for the words at the end of the lines. x.textdata
is a (24,1)
cell with those words. The quoted string s could be reassembled from those words, using the NA
and quotes to determine how many words belong to which line.
To read the numeric data it uses dlmread
. Since the rest of importdata
is written in Octave
, it could be used as the starting point for a custom function that handles the string data properly.
dlmread ('stack25148040.txt')(:,1:8)
importread ('stack25148040.txt').data(:,1:8)
textread ('stack25148040.txt','')(:,1:8)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With