I have a comma separated file with 182 rows and 501 columns, of which 500 columns are of type number (features) while the last column are strings (labels).
Example: 182x501 dimension
1,3,4,6,.........7, ABC
4,5,6,4,.........9, XYZ
3,4,5,3,.........2, ABC
How can I load this file so it will have a data set with a matrix, B, containing the number as my features, and a vector, C, containing the strings as my labels?
d = dataset(B, C);
Build a format specifier for textscan based on the number and types of columns, and have it read the file for you.
nNumberCols = 500;
format = [repmat('%f,', [1 nNumberCols]) '%s'];
fid = fopen(file);
x = textscan(fid, format);
fclose(fid);
B = cat(2, x{1:nNumberCols});
C = x{end};
You could use the textscan function. For example:
fid = fopen('test.dat');
% Read numbers and string into a cell array
data = textscan(fid, '%s %s');
% Then extract the numbers and strings into their own cell arrays
nums = data{1};
str = data{2};
% Convert string of numbers to numbers
for i = 1:length(str)
nums{i} = str2num(nums{i}); %#ok<ST2NM>
end
% Finally, convert cell array of numbers to a matrix
nums = cell2mat(nums);
fclose(fid);
Note that I have made a number of assumptions here, based on the file format you have specified. For example, I assume that there are no spaces after the commas following a number, but that there is a space immediately preceding the string at the end of each line.
To can make the above code more flexible by using a more considered format specifier (the second argument to textscan). See the section Basic Conversion Specifiers in the textscan documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With