Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Import CSV file with mixed data types

I'm working with MATLAB for few days and I'm having difficulties to import a CSV-file to a matrix.

My problem is that my CSV-file contains almost only Strings and some integer values, so that csvread() doesn't work. csvread() only gets along with integer values.

How can I store my strings in some kind of a 2-dimensional array to have free access to each element?

Here's a sample CSV for my needs:

04;abc;def;ghj;klm;;;;; ;;;;;Test;text;0xFF;; ;;;;;asdfhsdf;dsafdsag;0x0F0F;; 

The main thing are the empty cells and the texts within the cells. As you see, the structure may vary.

like image 300
poeschlorn Avatar asked Jan 20 '11 13:01

poeschlorn


People also ask

Can CSV files hold a variety of data types?

Unlike other spreadsheet files, CSVs only carry a single sheet, with data fields most often separated by commas. They can store strings of numbers and words but not formulas and formatting styles.

Which of the following is the function used to read mixed type of data from file?

The readtable function automatically detects the delimiter and the variable types.

How do I read a CSV file in Matlab?

M = csvread( filename ) reads a comma-separated value (CSV) formatted file into array M . The file must contain only numeric values. M = csvread( filename , R1 , C1 ) reads data from the file starting at row offset R1 and column offset C1 . For example, the offsets R1=0 , C1=0 specify the first value in the file.


2 Answers

For the case when you know how many columns of data there will be in your CSV file, one simple call to textscan like Amro suggests will be your best solution.

However, if you don't know a priori how many columns are in your file, you can use a more general approach like I did in the following function. I first used the function fgetl to read each line of the file into a cell array. Then I used the function textscan to parse each line into separate strings using a predefined field delimiter and treating the integer fields as strings for now (they can be converted to numeric values later). Here is the resulting code, placed in a function read_mixed_csv:

function lineArray = read_mixed_csv(fileName, delimiter)    fid = fopen(fileName, 'r');         % Open the file   lineArray = cell(100, 1);           % Preallocate a cell array (ideally slightly                                       %   larger than is needed)   lineIndex = 1;                      % Index of cell to place the next line in   nextLine = fgetl(fid);              % Read the first line from the file   while ~isequal(nextLine, -1)        % Loop while not at the end of the file     lineArray{lineIndex} = nextLine;  % Add the line to the cell array     lineIndex = lineIndex+1;          % Increment the line index     nextLine = fgetl(fid);            % Read the next line from the file   end   fclose(fid);                        % Close the file    lineArray = lineArray(1:lineIndex-1);              % Remove empty cells, if needed   for iLine = 1:lineIndex-1                          % Loop over lines     lineData = textscan(lineArray{iLine}, '%s', ...  % Read strings                         'Delimiter', delimiter);     lineData = lineData{1};                          % Remove cell encapsulation     if strcmp(lineArray{iLine}(end), delimiter)      % Account for when the line       lineData{end+1} = '';                          %   ends with a delimiter     end     lineArray(iLine, 1:numel(lineData)) = lineData;  % Overwrite line data   end  end 

Running this function on the sample file content from the question gives this result:

>> data = read_mixed_csv('myfile.csv', ';')  data =     Columns 1 through 7      '04'    'abc'    'def'    'ghj'    'klm'    ''            ''             ''      ''       ''       ''       ''       'Test'        'text'         ''      ''       ''       ''       ''       'asdfhsdf'    'dsafdsag'    Columns 8 through 10      ''          ''    ''     '0xFF'      ''    ''     '0x0F0F'    ''    '' 

The result is a 3-by-10 cell array with one field per cell where missing fields are represented by the empty string ''. Now you can access each cell or a combination of cells to format them as you like. For example, if you wanted to change the fields in the first column from strings to integer values, you could use the function str2double as follows:

>> data(:, 1) = cellfun(@(s) {str2double(s)}, data(:, 1))  data =     Columns 1 through 7      [  4]    'abc'    'def'    'ghj'    'klm'    ''            ''             [NaN]    ''       ''       ''       ''       'Test'        'text'         [NaN]    ''       ''       ''       ''       'asdfhsdf'    'dsafdsag'    Columns 8 through 10      ''          ''    ''     '0xFF'      ''    ''     '0x0F0F'    ''    '' 

Note that the empty fields results in NaN values.

like image 198
gnovice Avatar answered Sep 19 '22 15:09

gnovice


Given the sample you posted, this simple code should do the job:

fid = fopen('file.csv','r'); C = textscan(fid, repmat('%s',1,10), 'delimiter',';', 'CollectOutput',true); C = C{1}; fclose(fid); 

Then you could format the columns according to their type. For example if the first column is all integers, we can format it as such:

C(:,1) = num2cell( str2double(C(:,1)) ) 

Similarly, if you wish to convert the 8th column from hex to decimals, you can use HEX2DEC:

C(:,8) = cellfun(@hex2dec, strrep(C(:,8),'0x',''), 'UniformOutput',false); 

The resulting cell array looks as follows:

C =      [  4]    'abc'    'def'    'ghj'    'klm'    ''            ''                []    ''    ''     [NaN]    ''       ''       ''       ''       'Test'        'text'        [ 255]    ''    ''     [NaN]    ''       ''       ''       ''       'asdfhsdf'    'dsafdsag'    [3855]    ''    '' 
like image 37
Amro Avatar answered Sep 17 '22 15:09

Amro