Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Chain verbs in J

Tags:

j

Suppose a boxed matrix containing various types:

matrix =: ('abc';'defgh';23),:('foo';'bar';45)
matrix
+---+-----+--+
|abc|defgh|23|
+---+-----+--+
|foo|bar  |45|
+---+-----+--+

And a column descriptor:

columnTypes =: 'string';'string';'num'

I want to apply verbs on this matrix by column according to types. I'll be using verbs DoString and DoNum:

chain =: (('string';'num') i. columnTypes) { DoString`DoNum

EDIT: The column descriptors are important, the decision on which verb to use is based on them, not on the type itself. In reality, I could have several types of strings, numerics, and even dates (which would be numeric in J).

How do I apply chain to each row of matrix? The verbs themselves can take care of whether the passed value is boxed or not, that's fine. Also, I'd rather avoid transposing the matrix (|:) as it could be quite large.

like image 778
MPelletier Avatar asked Aug 01 '11 22:08

MPelletier


2 Answers

The standard method for doing this is:

  1. Convert your row (cell)-oriented structure to a column-oriented structure

  2. Apply the correct verb to each column (just once)

Step (1) is easy. Step (2) is also easy, but not as obvious. There's a little trick that helps.

The trick is knowing that a number of primitive operators accept a gerund as a left argument and produce a function which cycles over the gerund, applying each verb in turn. IMO, the most useful operator in this category is ;. . Here's an example implementation using it:

Step (0), inputs:

   matrix      =:  ('abc';'defgh';23),:('foo';'bar';45)

   columnTypes =:  'string';'string';'num'

   DoString    =:  toupper
   DoNum       =:  0&j.

   matrix
+---+-----+--+
|abc|defgh|23|
+---+-----+--+
|foo|bar  |45|
+---+-----+--+

Step (1), columify data:

   columnify   =:  <@:>"1@:|: :. rowify =: <"_1&>
   columnify matrix
+---+-----+-----+
|abc|defgh|23 45|
|foo|bar  |     |
+---+-----+-----+

Note that the columnify is provided with an inverse which will re-"rowify" data, though you shouldn't do that: see below.

Step (2), apply the correct verb to each column (exactly once), using the verb-cycling feature of ;.:

   homogenize  =:  ({. foo&.>@:{.`'') [^:('foo'-:])L:0~ ]
   chain       =:  DoString`DoNum`] homogenize@{~  ('string';'num')&i.  

Note that the default transformation for unknown column-types is the identity function, ].

The verb homogenize normalizes the input & output of each column-processor (i.e abstracts out the pre- and post-processing so that the user only has to provide with the dynamic "core" of the transformation). The verb chain takes a list of column-types as an input and derives a gerund appropriate for use a left-hand argument to ;. (or a similar operator).

Thus:

   1 (chain columnTypes);.1  columnify matrix
+---+-----+---------+
|ABC|DEFGH|0j23 0j45|
|FOO|BAR  |         |
+---+-----+---------+

Or, if you really must have an NxM table of boxed cells, apply the cut "under" columnify:

   1 (chain columnTypes);.1&.columnify matrix
+-----+-----+
|ABC  |FOO  |
+-----+-----+
|DEFGH|BAR  |
+-----+-----+
|0j23 |0j45 |
+-----+-----+

But note it is much more appropriate, in a J context, to keep the table as a list of homogeneous columns, for both performance and notational reasons.

J works best when processing arrays "in toto"; the rule of thumb is you should let primitive or user-defined name see as much data as possible at each application. That's the major benefit of this "columificaton" approach: if you store your data as a list of homogeneous columns, it will be faster and easier to manipulate later.

However, if your use-case really demands you keep the data as a NxM table of boxed cells, then converting your data to- and from- column normal form is an expensive no-op. In that case, you should stick with your original solution,

   1 chain\"1 matrix

which (because you asked) actually works on the same premise as the ;. approach. In particular, \ is another of those primitive operators which accepts a gerund argument, and applies each verb in succession (i.e. to each a new window of data, cyclically).

In effect, what 1 chain\"1 matrix does is break the matrix into rows ("1), and for each row, it creates a 1-wide moving window, (1 f\ matrix), applying the verbs of chain to each of those 1-wide windows cylically (i.e. f changes with every 1-wide data window of each row of the matrix).

Since the moving 1-window of a row (a rank-1 vector) is the atoms of the row, in order, and the verbs of chain are given in the same order, in effect you're applying those verbs to the columns of the matrix, one. atom. at. a. time.

In short: 1 chain\"1 matrix is analogous to foo"0 matrix, except foo changes for each atom. And it should be avoided for the same reason foo"0 matrix should be avoided in general: because applying functions at small rank works against the grain of J, incurring a performance penalty.

In general, it's better to use apply functions at higher ranks whenever you can, which in this case calls for converting (and maintaining) matrix to column-normal form.

In other words, here, ;. is to "1 as \ is to "0. If you find the whole columnify/homogenize thing too lengthy or bulky (compared to 1 chain\"1 matrix), you can import the script provided at [1], which packages up those definitions as re-usable utilities, with extensions. See the page for examples and instructions.

[1] Related utility script:
http://www.jsoftware.com/jwiki/DanBron/Snippets/DOOG

like image 166
Dan Bron Avatar answered Oct 23 '22 19:10

Dan Bron


If these calculations depend only on the data inside individual boxes (and, perhaps, global values,) it is possible to use Agenda with Under Open (aka Each). An application of this technique is shown below:

   doCells  =: (doNum`doString @. isLiteral)&.>
   isLiteral=: 2 -: 3!:0

   doNum    =: +:   NB. Double
   doString =: toupper

   doCells matrix
┌───┬─────┬──┐
│ABC│DEFGH│46│
├───┼─────┼──┤
│FOO│BAR  │90│
└───┴─────┴──┘

(In this example I've put in arbitrary meanings for doNum and doString to help make the viability plain.)

The version of isLiteral used here may well suffice, but it will fail if either sparse literal or unicode values will be involved.

If the calculations need to involve more of the matrix than a single box, this won't be the answer to your question. If calculation needs to occur by line, instead, the solution may involve applying a verb at rank _1 (i.e. to each item of the highest axis.)

like image 34
kaleidic Avatar answered Oct 23 '22 18:10

kaleidic