Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mapping of elements by number of occurrences in J

Tags:

j

Using J language, I wish to attain a mapping of the counts of elements of an array.

Specifically, I want to input a lowercased English word with two to many letters and get back each pair of letters in the word along with counts of occurences.

I need a verb that gives something like this, in whatever J structure you think is appropriate:

For 'cocoa':

co 2
oc 1
oa 1

For 'banana':

ba 1
an 2
na 2

For 'milk':

mi 1
il 1
lk 1

For 'to':

to 1

(For single letter words like 'a', the task is undefined and will not be attempted.)

(Order is not important, that's just how I happened to list them.)

I can easily attain successive pairs of letters in a word as a matrix or list of boxes:

   2(] ;._3)'cocoa'
co
oc
co
oa
   ]
   2(< ;._3)'cocoa'
┌──┬──┬──┬──┐
│co│oc│co│oa│
└──┴──┴──┴──┘

But I need help getting from there to a mapping of pairs to counts.

I am aware of ~. and ~: but I don't just want to return the unique elements or indexes of duplicates. I want a mapping of counts.

NuVoc's "Loopless" page is indicating that / (or /\. or /\) are where I should be looking for accumulation problems. I am familiar with / for arithmetic operations on numeric arrays, but for u/y I don't know what u would have to be to accumulate the list of pairs of letters that would make up y.

(NB. I can already do this in "normal" languages like Java or Python without help. Similar questions on SO are for languages with very different syntax and semantics to J. I am interested in the idiomatic J approach to this sort of problem.)

like image 996
dukereg Avatar asked Jan 03 '23 21:01

dukereg


2 Answers

To get the list of 2-letter combinations I'd use dyadic infix (\):

   2 ]\ 'banana'
ba
an
na
an
na

To count occurrences the primitive that immediately comes to mind is key (/.)

   #/.~ 2 ]\ 'banana'
1 2 2

If you want to match the counts to the letter combinations you can extend the verb to the following fork:

   ({. ; #)/.~ 2 ]\ 'banana'
┌──┬─┐
│ba│1│
├──┼─┤
│an│2│
├──┼─┤
│na│2│
└──┴─┘
like image 144
Tikkanz Avatar answered Feb 13 '23 00:02

Tikkanz


I think that you are looking to map counts of unique items to the items. You can correct me if I am wrong.

Starting with

 [t=. 2(< ;._3)'cocoa'
┌──┬──┬──┬──┐
│co│oc│co│oa│
└──┴──┴──┴──┘

You can use ~. (Nub) to return the unique items in the list

   ~.t
┌──┬──┬──┐
│co│oc│oa│
└──┴──┴──┘

Then if you compare the nub to the boxed list you get a matrix where the 1's are the positions that match the nub to the boxed pairs in your string

   t =/ ~.t
1 0 0
0 1 0
1 0 0
0 0 1

Sum the columns of this matrix and you get the number of times each item of the nub shows up

    +/  t =/ ~.t
2 1 1

Then box them so that you can combine the integers along side the boxed characters

   <"0 +/  t =/ ~.t
┌─┬─┬─┐
│2│1│1│
└─┴─┴─┘

Combine them by stitching together the nub and the count using ,. (Stitch)

 (~.t) ,. <"0 +/  t =/ ~.t
┌──┬─┐
│co│2│
├──┼─┤
│oc│1│
├──┼─┤
│oa│1│
└──┴─┘
       [t=. 2(< ;._3)'banana'
┌──┬──┬──┬──┬──┐
│ba│an│na│an│na│
└──┴──┴──┴──┴──┘
   (~.t) ,. <"0 +/  t =/ ~.t
┌──┬─┐
│ba│1│
├──┼─┤
│an│2│
├──┼─┤
│na│2│
└──┴─┘
   [t=. 2(< ;._3)'milk'
┌──┬──┬──┐
│mi│il│lk│
└──┴──┴──┘
   (~.t) ,. <"0 +/  t =/ ~.t
┌──┬─┐
│mi│1│
├──┼─┤
│il│1│
├──┼─┤
│lk│1│
└──┴─┘

Hope this helps.

like image 42
bob Avatar answered Feb 12 '23 23:02

bob