Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change default NaN representation of fprintf() in Matlab

Tags:

printf

matlab

I am trying to export data from Matlab in format that would be understood by another application... For that I need to change the NaN, Inf and -Inf strings (that Matlab prints by default for such values) to //m, //inf+ and //Inf-.

In general I DO KNOW how to accomplish this. I am asking how (and whether it is possible) to exploit one particular thing in Matlab. The actual question is located in the last paragraph.

There are two approaches that I have attempted (code bellow).

  1. Use sprintf() on data and strrep() the output. This is done in line-by-line fashion in order to save memory. This solution takes almost 10 times more time than simple fprintf(). The advantage is that it has low memory overhead.
  2. Same as option 1., but the translation is done on the whole data at once. This solution is way faster, but vulnerable to out of memory exception. My problem with this approach is that I do not want to unnecessarily duplicate the data.

Code:

rows = 50000  
cols = 40  
data = rand(rows, cols); % generate random matrix  
data([1 3 8]) = NaN; % insert some NaN values  
data([5 6 14]) = Inf; % insert some Inf values  
data([4 2 12]) = -Inf; % insert some -Inf values  

fid = fopen('data.txt', 'w'); %output file  

%% 0) Write data using default fprintf  
format = repmat('%g ', 1, cols);  

tic  
fprintf(fid, [format '\n'], data');  
toc  

%% 1) Using strrep, writing line by line  
fprintf(fid, '\n');  
tic  
for i = 1:rows  
    fprintf(fid, '%s\n', strrep(strrep(strrep(sprintf(format, data(i, :)), 'NaN', '//m'), '-Inf', '//inf-'), 'Inf', '//inf+'));  
end  
toc  

%% 2) Using strrep, writing all at once  
fprintf(fid, '\n');  
format = [format '\n'];  
tic  
    fprintf(fid, '%s\n', strrep(strrep(strrep(sprintf(format, data'), 'NaN', '//m'), '-Inf', '//inf-'), 'Inf', '//inf+'));  
toc  

Output:

Elapsed time is 1.651089 seconds. % Regular fprintf()
Elapsed time is 11.529552 seconds. % Option 1
Elapsed time is 2.305582 seconds. % Option 2

Now to the question...

I am not satisfied with the memory overhead and time lost using my solutions in comparison with simple fprintf().
My rationale is that the 'NaN', 'Inf' and '-Inf' strings are simple data saved in some variable inside the *printf() or *2str() implementation. Is there any way to change their value at runtime?
For example in C# I would change the System.Globalization.CultureInfo.NumberFormat.NaNSymbol, etc. as explaind here.

like image 560
Kupto Avatar asked Nov 10 '22 04:11

Kupto


1 Answers

In the limited case mentioned in comments that a number of (unknown, changing per data set) columns may be entirely NaN (or Inf, etc), but that there are not unwanted NaN values otherwise, another possibility is to check the first row of data, assemble a format string which writes the \\m strings directly, and use that while telling fprintf to ignore the columns that contain NaN or other unwanted values.

y = ~isnan(data(1,:)); % find all non-NaN
format = sprintf('%d ',y); % print a 1/0 string
format = strrep(format,'1','%g'); 
format = strrep(format,'0','//m'); 

fid = fopen('data.txt', 'w'); 
fprintf(fid, [format '\n'], data(:,y)'); %pass only the non-NaN data
fclose(fid);

By my check with two columns of NaN this fprintf is pretty much the same as your "regular" fprintf and quicker than the loop - not taking into account the initialisation step of producing format. It would be fiddlier to set it up to automatically produce the format string if you also have to take +/- Inf into account, but certainly possible. There is probably a cleaner way of producing format as well.

How it works:

You can pass in a subset of your data, and you can also insert any text you like into a format string, so if every row has the same desired "text" in the same spot (in this case NaN columns and our desired replacement for "NaN"), we can put the text we want in that spot and then just not pass those parts of the data to fprintf in the first place. A simpler example for trying out on the command line:

x = magic(5);
x(:,3)=NaN
sprintf('%d %d ihatethrees %d %d \n',x(:,[1,2,4,5])');
like image 131
nkjt Avatar answered Nov 15 '22 07:11

nkjt