Suppose a boxed matrix containing various types:
matrix =: ('abc';'defgh';23),:('foo';'bar';45)
matrix
+---+-----+--+ |abc|defgh|23| +---+-----+--+ |foo|bar |45| +---+-----+--+
And a column descriptor:
columnTypes =: 'string';'string';'num'
I want to apply verbs on this matrix by column according to types. I'll be using verbs DoString and DoNum:
chain =: (('string';'num') i. columnTypes) { DoString`DoNum
EDIT: The column descriptors are important, the decision on which verb to use is based on them, not on the type itself. In reality, I could have several types of strings, numerics, and even dates (which would be numeric in J).
How do I apply chain
to each row of matrix
? The verbs themselves can take care of whether the passed value is boxed or not, that's fine. Also, I'd rather avoid transposing the matrix (|:
) as it could be quite large.
The standard method for doing this is:
Convert your row (cell)-oriented structure to a column-oriented structure
Apply the correct verb to each column (just once)
Step (1) is easy. Step (2) is also easy, but not as obvious. There's a little trick that helps.
The trick is knowing that a number of primitive operators accept a gerund as a left argument and produce a function which cycles over the gerund, applying each verb in turn. IMO, the most useful operator in this category is ;.
. Here's an example implementation using it:
Step (0), inputs:
matrix =: ('abc';'defgh';23),:('foo';'bar';45)
columnTypes =: 'string';'string';'num'
DoString =: toupper
DoNum =: 0&j.
matrix
+---+-----+--+
|abc|defgh|23|
+---+-----+--+
|foo|bar |45|
+---+-----+--+
Step (1), columify data:
columnify =: <@:>"1@:|: :. rowify =: <"_1&>
columnify matrix
+---+-----+-----+
|abc|defgh|23 45|
|foo|bar | |
+---+-----+-----+
Note that the columnify is provided with an inverse which will re-"rowify" data, though you shouldn't do that: see below.
Step (2), apply the correct verb to each column (exactly once), using the verb-cycling feature of ;.
:
homogenize =: ({. foo&.>@:{.`'') [^:('foo'-:])L:0~ ]
chain =: DoString`DoNum`] homogenize@{~ ('string';'num')&i.
Note that the default transformation for unknown column-types is the identity function, ]
.
The verb homogenize
normalizes the input & output of each column-processor (i.e abstracts out the pre- and post-processing so that the user only has to provide with the dynamic "core" of the transformation). The verb chain
takes a list of column-types as an input and derives a gerund appropriate for use a left-hand argument to ;.
(or a similar operator).
Thus:
1 (chain columnTypes);.1 columnify matrix
+---+-----+---------+
|ABC|DEFGH|0j23 0j45|
|FOO|BAR | |
+---+-----+---------+
Or, if you really must have an NxM table of boxed cells, apply the cut "under" columnify:
1 (chain columnTypes);.1&.columnify matrix
+-----+-----+
|ABC |FOO |
+-----+-----+
|DEFGH|BAR |
+-----+-----+
|0j23 |0j45 |
+-----+-----+
But note it is much more appropriate, in a J context, to keep the table as a list of homogeneous columns, for both performance and notational reasons.
J works best when processing arrays "in toto"; the rule of thumb is you should let primitive or user-defined name see as much data as possible at each application. That's the major benefit of this "columificaton" approach: if you store your data as a list of homogeneous columns, it will be faster and easier to manipulate later.
However, if your use-case really demands you keep the data as a NxM table of boxed cells, then converting your data to- and from- column normal form is an expensive no-op. In that case, you should stick with your original solution,
1 chain\"1 matrix
which (because you asked) actually works on the same premise as the ;.
approach. In particular, \
is another of those primitive operators which accepts a gerund argument, and applies each verb in succession (i.e. to each a new window of data, cyclically).
In effect, what 1 chain\"1 matrix
does is break the matrix into rows ("1
), and for each row, it creates a 1-wide moving window, (1 f\ matrix
), applying the verbs of chain
to each of those 1-wide windows cylically (i.e. f
changes with every 1-wide data window of each row of the matrix).
Since the moving 1-window of a row (a rank-1 vector) is the atoms of the row, in order, and the verbs of chain
are given in the same order, in effect you're applying those verbs to the columns of the matrix, one. atom. at. a. time.
In short: 1 chain\"1 matrix
is analogous to foo"0 matrix
, except foo changes for each atom. And it should be avoided for the same reason foo"0 matrix
should be avoided in general: because applying functions at small rank works against the grain of J, incurring a performance penalty.
In general, it's better to use apply functions at higher ranks whenever you can, which in this case calls for converting (and maintaining) matrix
to column-normal form.
In other words, here, ;.
is to "1
as \
is to "0
. If you find the whole columnify
/homogenize
thing too lengthy or bulky (compared to 1 chain\"1 matrix
), you can import the script provided at [1], which packages up those definitions as re-usable utilities, with extensions. See the page for examples and instructions.
[1] Related utility script:
http://www.jsoftware.com/jwiki/DanBron/Snippets/DOOG
If these calculations depend only on the data inside individual boxes (and, perhaps, global values,) it is possible to use Agenda with Under Open (aka Each). An application of this technique is shown below:
doCells =: (doNum`doString @. isLiteral)&.>
isLiteral=: 2 -: 3!:0
doNum =: +: NB. Double
doString =: toupper
doCells matrix
┌───┬─────┬──┐
│ABC│DEFGH│46│
├───┼─────┼──┤
│FOO│BAR │90│
└───┴─────┴──┘
(In this example I've put in arbitrary meanings for doNum
and doString
to help make the viability plain.)
The version of isLiteral
used here may well suffice, but it will fail if either sparse literal or unicode values will be involved.
If the calculations need to involve more of the matrix than a single box, this won't be the answer to your question. If calculation needs to occur by line, instead, the solution may involve applying a verb at rank _1 (i.e. to each item of the highest axis.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With