Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to count the frequency of a element in APL or J without loops

Tags:

j

apl

Assume I have two lists, one is the text t, one is a list of characters c. I want to count how many times each character appears in the text.

This can be done easily with the following APL code.

+⌿t∘.=c

However it is slow. It take the outer product, then sum each column.

It is a O(nm) algorithm where n and m are the size of t and c.

Of course I can write a procedural program in APL that read t character by character and solve this problem in O(n+m) (assume perfect hashing).

Are there ways to do this faster in APL without loops(or conditional)? I also accept solutions in J.

Edit: Practically speaking, I'm doing this where the text is much shorter than the list of characters(the characters are non-ascii). I'm considering where text have length of 20 and character list have length in the thousands.

There is a simple optimization given n is smaller than m.

w  ← (∪t)∩c
f ←  +⌿t∘.=w
r ← (⍴c)⍴0
r[c⍳w] ← f
r

w contains only the characters in t, therefore the table size only depend on t and not c. This algorithm runs in O(n^2+m log m). Where m log m is the time for doing the intersection operation.

However, a sub-quadratic algorithm is still preferred just in case someone gave a huge text file.

like image 702
Chao Xu Avatar asked Aug 10 '11 08:08

Chao Xu


1 Answers

NB. Using "key" (/.) adverb w/tally (#) verb counts

   #/.~ 'abdaaa'
4 1 1

NB. the items counted are the nub of the string.

   ~. 'abdaaa'
abd

NB. So, if we count the target along with the string

   #/.~ 'abc','abdaaa'
5 2 1 1

NB. We get an extra one for each of the target items.

   countKey2=: 4 : '<:(#x){.#/.~ x,y'

NB. This subtracts 1 (<:) from each count of the xs.

   6!:2 '''1'' countKey2 10000000$''1234567890'''
0.0451088
   6!:2 '''1'' countKey2 1e7$''1234567890'''
0.0441849
   6!:2 '''1'' countKey2 1e8$''1234567890'''
0.466857

NB. A tacit version

   countKey=. [: <: ([: # [) {. [: #/.~ ,

NB. appears to be a little faster at first

   6!:2 '''1'' countKey 1e8$''1234567890'''
0.432938

NB. But repeating the timing 10 times shows they are the same.

   (10) 6!:2 '''1'' countKey 1e8$''1234567890'''
0.43914
   (10) 6!:2 '''1'' countKey2 1e8$''1234567890'''
0.43964
like image 188
DevonMcC Avatar answered Sep 30 '22 17:09

DevonMcC