I am trying to export data from Matlab in format that would be understood by another application... For that I need to change the NaN
, Inf
and -Inf
strings (that Matlab prints by default for such values) to //m
, //inf+
and //Inf-
.
In general I DO KNOW how to accomplish this. I am asking how (and whether it is possible) to exploit one particular thing in Matlab. The actual question is located in the last paragraph.
There are two approaches that I have attempted (code bellow).
sprintf()
on data and strrep()
the output. This is done in line-by-line fashion in order to save memory. This solution takes almost 10 times more time than simple fprintf()
. The advantage is that it has low memory overhead.rows = 50000 cols = 40 data = rand(rows, cols); % generate random matrix data([1 3 8]) = NaN; % insert some NaN values data([5 6 14]) = Inf; % insert some Inf values data([4 2 12]) = -Inf; % insert some -Inf values fid = fopen('data.txt', 'w'); %output file %% 0) Write data using default fprintf format = repmat('%g ', 1, cols); tic fprintf(fid, [format '\n'], data'); toc %% 1) Using strrep, writing line by line fprintf(fid, '\n'); tic for i = 1:rows fprintf(fid, '%s\n', strrep(strrep(strrep(sprintf(format, data(i, :)), 'NaN', '//m'), '-Inf', '//inf-'), 'Inf', '//inf+')); end toc %% 2) Using strrep, writing all at once fprintf(fid, '\n'); format = [format '\n']; tic fprintf(fid, '%s\n', strrep(strrep(strrep(sprintf(format, data'), 'NaN', '//m'), '-Inf', '//inf-'), 'Inf', '//inf+')); toc
Elapsed time is 1.651089 seconds. % Regular fprintf()
Elapsed time is 11.529552 seconds. % Option 1
Elapsed time is 2.305582 seconds. % Option 2
Now to the question...
I am not satisfied with the memory overhead and time lost using my solutions in comparison with simple fprintf()
.
My rationale is that the 'NaN'
, 'Inf'
and '-Inf'
strings are simple data saved in some variable inside the *printf()
or *2str()
implementation. Is there any way to change their value at runtime?
For example in C# I would change the System.Globalization.CultureInfo.NumberFormat.NaNSymbol
, etc. as explaind here.
In the limited case mentioned in comments that a number of (unknown, changing per data set) columns may be entirely NaN
(or Inf
, etc), but that there are not unwanted NaN
values otherwise, another possibility is to check the first row of data, assemble a format string which writes the \\m
strings directly, and use that while telling fprintf
to ignore the columns that contain NaN
or other unwanted values.
y = ~isnan(data(1,:)); % find all non-NaN
format = sprintf('%d ',y); % print a 1/0 string
format = strrep(format,'1','%g');
format = strrep(format,'0','//m');
fid = fopen('data.txt', 'w');
fprintf(fid, [format '\n'], data(:,y)'); %pass only the non-NaN data
fclose(fid);
By my check with two columns of NaN
this fprintf
is pretty much the same as your "regular" fprintf
and quicker than the loop - not taking into account the initialisation step of producing format
. It would be fiddlier to set it up to automatically produce the format string if you also have to take +/- Inf
into account, but certainly possible. There is probably a cleaner way of producing format
as well.
How it works:
You can pass in a subset of your data, and you can also insert any text you like into a format string, so if every row has the same desired "text" in the same spot (in this case NaN
columns and our desired replacement for "NaN"), we can put the text we want in that spot and then just not pass those parts of the data to fprintf
in the first place. A simpler example for trying out on the command line:
x = magic(5);
x(:,3)=NaN
sprintf('%d %d ihatethrees %d %d \n',x(:,[1,2,4,5])');
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With