I am using Mathematica 7 to process a large data set. The data set is a three-dimensional array of signed integers. The three levels may be thought of as corresponding to X points per shot, Y shots per scan, and Z scans per set.
I also have a "zeroing" shot (containing X points, which are signed fractions of integers), which I would like to subtract from every shot in the data set. Afterwards, I will never again need the original data set.
How can I perform this transformation without creating new copies of the data set, or parts of it, in the process? Conceptually, the data set is located in memory, and I would like to scan through each element, and change it at that location in memory, without permanently copying it to some other memory location.
The following self-contained code captures all the aspects of what I am trying to do:
(* Create some offsetted data, and a zero data set. *)
myData = Table[Table[Table[RandomInteger[{1, 100}], {k, 500}], {j, 400}], {i, 200}];
myZero = Table[RandomInteger[{1, 9}]/RandomInteger[{1, 9}] + 50, {i, 500}];
(* Method 1 *)
myData = Table[
f1 = myData[[i]];
Table[
f2 = f1[[j]];
f2 - myZero, {j, 400}], {i, 200}];
(* Method 2 *)
Do[
Do[
myData[[i]][[j]] = myData[[i]][[j]] - myZero, {j, 400}], {i, 200}]
(* Method 3 *)
Attributes[Zeroing] = {HoldFirst};
Zeroing[x_] := Module[{},
Do[
Do[
x[[i]][[j]] = x[[i]][[j]] - myZero, {j, Length[x[[1]]]}
], {i, Length[x]}
]
];
(Note: Hat tip to Aaron Honecker for Method #3.)
On my machine (Intel Core2 Duo CPU 3.17 GHz, 4 GB RAM, 32-bit Windows 7), all three methods use roughly 1.25 GB of memory, with #2 and #3 fairing slightly better.
If I don't mind losing precision, wrapping N[ ]
around the innards of myData
and myZero
when they're being created increases their size in memory by 150 MB initially but reduces the amount of memory required for zeroing (by methods #1-#3 above) from 1.25 GB down to just 300 MB! That's my working solution, but it would be great to know the best way of handling this problem.
Unfortunately I have little time now, so I must be concise ...
When working with large data, you need to be aware that Mathematica has a different storage format called packed arrays which is much more compact and much faster than the regular one, but only works for machine reals or integers.
Please evaluate ?Developer`*Packed*
to see what functions are available for directly converting to/from them, if this doesn't happen automatically.
So the brief explanation behind why my solution is fast and memory efficient is that it uses packed arrays. I tested using Developer`PackedArrayQ
that my arrays never get unpacked, and I used machine reals (I applied N[]
to everything)
In[1]:= myData = N@RandomInteger[{1, 100}, {200, 400, 500}];
In[2]:= myZero =
Developer`ToPackedArray@
N@Table[RandomInteger[{1, 9}]/RandomInteger[{1, 9}] + 50, {i, 500}];
In[3]:= myData = Map[# - myZero &, myData, {2}]; // Timing
Out[3]= {1.516, Null}
Also, the operation you were asking for ("I would like to scan through each element, and change it at that location in memory") is called mapping (see Map[]
or /@
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With