Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loading text data in Octave with specific format

I have a data set that I would like to store and be able to load in Octave

18.0   8   307.0      130.0      3504.      12.0   70  1    "chevrolet chevelle malibu"
15.0   8   350.0      165.0      3693.      11.5   70  1    "buick skylark 320"
18.0   8   318.0      150.0      3436.      11.0   70  1    "plymouth satellite"
16.0   8   304.0      150.0      3433.      12.0   70  1    "amc rebel sst"
17.0   8   302.0      140.0      3449.      10.5   70  1    "ford torino"
15.0   8   429.0      198.0      4341.      10.0   70  1    "ford galaxie 500"
14.0   8   454.0      220.0      4354.       9.0   70  1    "chevrolet impala"
14.0   8   440.0      215.0      4312.       8.5   70  1    "plymouth fury iii"
14.0   8   455.0      225.0      4425.      10.0   70  1    "pontiac catalina"
15.0   8   390.0      190.0      3850.       8.5   70  1    "amc ambassador dpl"

It does not work immediately when I try to use:

data = load('auto.txt')

Is there a way to load from a text files with the given format or do I need to convert it to e.g

18.0,8,307.0,130.0,3504.0,12.0,70,1
...

EDIT: Deleting the last row and fixing the 'half' number e.g. 3504. -> 3504.0 and then used:

data = load('-ascii','autocleaned.txt');

Loaded the data as wanted in to a matrix in Octave.

like image 325
user317706 Avatar asked Aug 05 '14 20:08

user317706


2 Answers

load is usually meant for loading octave and matlab binary files but can be used for loading textual data like yours. You can load your data using the "-ascii" option but you'd have to reformat your file slightly before putting it into load even with the "-ascii" option enabled. Use a consistent column separator ie. just a tab or a comma, use full numbers not 3850. and don't use strings.

Then you can do something like this to get it to work

DATA = load("-ascii", "auto.txt");
like image 145
ShaneQful Avatar answered Oct 10 '22 17:10

ShaneQful


If the final string field is removed from each line, the file can be read with:

filename='stack25148040_1.txt'
fid = fopen(filename, 'r');
[x, count] = fscanf(fid, '%f', [10, Inf])
endif
fclose(fid);

Alternatively the whole file could read in as one column and reshaped.

I haven't figured out how to read both the numeric fields and the string field. For that I've had to fall back on Python with more general purpose file reading tools.

Here is a Python script that reads the file, creates a numpy structured array, writes that to a .mat file, which Octave can then read:

import csv
import numpy as np

data=[]
with open('stack25148040.txt','rb') as f:
    r = csv.reader(f, delimiter=' ')
    # csv handles quoted strings with white space
    for l in r:
        # remove empty strings from the split on ' '
        data.append([x for x in l if x])
print data[0]
for dd in data:
    # convert 8 of the strings (per line) to float
    dd[:]=[float(d) for d in dd[:8]]+dd[-1:]

data=data[:-1]  # remove empty last line
print data[0]
print
# make a structured array, with numbers and a string
dt=np.dtype("f8,i4,f8,f8,f8,f8,i4,i4,|S25")
A=np.array([tuple(d) for d in data],dtype=dt)
print A
from scipy.io import savemat
savemat('stack25148040.mat',{'A':A})

In Octave this could read with

load stack25148040.mat
A
# A = 1x10 struct array containing the fields:
#    f0 f1 ... f8

A.f8  # string field
A(1)  # 1st row
#  scalar structure containing the fields:
#   f0 =  18
#   f1 = 8
...
#   f8 = chevrolet chevelle malibu

Newer Octave (3.8) has an importdata function. It handles the original data file without any extra arguments. It returns a structure with 2 fields

x.data is a (10,11) matrix. x.data(:,1:8) is the desire numerical data. x.data(:,9:11) is a mix of NA and random numbers. The NA stand in for the words at the end of the lines. x.textdata is a (24,1) cell with those words. The quoted string s could be reassembled from those words, using the NA and quotes to determine how many words belong to which line.

To read the numeric data it uses dlmread. Since the rest of importdata is written in Octave, it could be used as the starting point for a custom function that handles the string data properly.

dlmread ('stack25148040.txt')(:,1:8)
importread ('stack25148040.txt').data(:,1:8)
textread ('stack25148040.txt','')(:,1:8)
like image 41
hpaulj Avatar answered Oct 10 '22 16:10

hpaulj