Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

elegant way to count items

Tags:

emacs

lisp

elisp

I have a list shaped like this:

  '(("Alpha" .  1538)
    ("Beta"  .  8036)
    ("Gamma" .  8990)
    ("Beta"  .  10052)
    ("Alpha" .  12837)
    ("Beta"  .  13634)
    ("Beta"  .  14977)
    ("Beta"  .  15719)
    ("Alpha" .  17075)
    ("Rho"   .  18949)
    ("Gamma" .  21118)
    ("Gamma" .  26923)
    ("Alpha" .  31609))

How can I count the total number of occurrences of the terms in the car of each element in the list? Basically I want:

(("Alpha" . 4)
 ("Beta" . 5)
 ("Gamma" . 3)
 ("Rho" . 1))

No, this is not homework. I just don't have the "thinking in Lisp" thing quite yet.

In C#, I would use LINQ to do this. I can do it in lisp, too, using while loops and such but the way I am thinking of doing it seems overly complicated.


EDIT

This is what I have:

(defun count-uniq (list)
  "Returns an alist, each item is a cons cell where the car is
a unique element of LIST, and the cdr is the number of occurrences of that
unique element in the list. "
  (flet ((helper (list new)
                 (if (null list)
                     new
                   (let ((elt (assoc (car list) new)))
                     (helper (cdr list)
                             (if elt
                                 (progn (incf (cdr elt)) new)
                               (cons (cons (car list) 1) new)))))))
    (nreverse (helper list nil))))
like image 326
Cheeso Avatar asked May 18 '11 19:05

Cheeso


1 Answers

Every time you want to traverse a list and return some value afterwards, be it a new list or some aggregate result, you are thinking of a fold, also called "reduce" in Python and Lisps. Fold is a great abstraction, as it allows to write generic code, applicable for many use-cases just by tweaking some elements. What is similar between finding a sum of several numbers, finding a product, finding a minimum integer? They are all folds, because you run through the list and then return some result based on its content. In Emacs Lisp they would look like this:

(reduce '+ '(1 2 3 4 5)) ; 15
(reduce '* '(1 2 3 4 5)) ; 120
(reduce 'min '(1 2 3 4 5)) ; 1

But folds are even more general than this. What is similar between finding a sum, counting a number of even numbers in a list, removing every odd number, and building a list with every number increased by 5? Every such function can be implemented by taking some base value, successively transform it, until you get the result. You take this base value, metaphorical blob of clay, call it "accumulator", then take one element from a list and based on this element do something to this blob of clay, make it a draft of a magnificent sculpture. Then you take the next element from a list and do something new to your sculpture. You repeat that until the list is empty and you end up with a masterpiece. It's as if every element of a list is a single instruction in a large recipe. Just bear in mind, that you are completely free to do anything with the clay, you don't have to use the list elements in the result directly—technically, this means that the accumulator (and, thus, the result) may be of different type.

(reduce '+ '(1 2 3 4 5) :initial-value 0) ; 15
(reduce (lambda (acc x) (if (evenp x) (1+ acc) acc)) '(1 2 3 4 5) :initial-value 0) ; 2
(reduce (lambda (x acc) (if (oddp x) acc (cons x acc))) '(1 2 3 4 5) :initial-value '() :from-end t) ; (2 4)
(reduce (lambda (x acc) (cons (+ x 5) acc)) '(1 2 3 4 5) :initial-value '() :from-end t) ; (6 7 8 9 10)

Note about reducing from end: lists in Lisps are not smart arrays like in Python or Java, they are linked lists, therefore accessing or changing an element somewhere in a list is an O(n) operation, while "consing" to the beginning of a list is O(1). In other words, appending an element to the end of a list is expensive, therefore Lispers usually add elements to the beginning of a list and then finally reverse the list, which is called push/nreverse idiom. If we did the ordinary reduce in the last 2 functions, we would cons 1 to the accumulator and get (1), then cons 2 to accumulator and get (2 1), until we get the correct result but upside-down. We could use reverse function afterwards, but luckily Emacs's reduce supports :from-end keyword argument, so it conses 5, then 4, then 3, and so on.

It's clear now, that your operation is a fold, traverse the original alist and count occurrences of each key. Before writing our fold, let's talk about alists first. Alist in Lisp is a poor man's hash-table. You don't usually tinker with a programming language's hash-table implementation details, do you? You work with an API. In Python this API looks like square bracket syntax (d['a'] = 1) and dict methods (d.keys()). For alists API contains function assoc, which returns an item provided a key.

(assoc 'a '((a . 1) (b . 2))) ; (a . 1)

Why do I talk about implementation details? Because you work via assoc and you don't care how exactly this alist looks like, you abstract that away. Another piece of API is that if you want to add a new element or change an existing one, you simply cons a dotted pair to the alist. It's how you supposed to work with alists, regardless of their internal structure. Why does that work? For example, if I want to change value for key a to 10, I would simply run (cons '(a . 10) my-alist), and my-alist would end up being '((a . 10) (a . 1) (b . 2)). But it's no problem, because assoc returns only the first dotted pair and ignores the rest, so you can treat alist just like any other key-value data structure. With that in mind let's write our first serious fold.

(reduce (lambda (acc x)
          (let* ((key (car x))
                 (pair (assoc key acc))
                 (count (cdr pair)))
            (if pair
                (cons (cons key (1+ count)) acc)
              (cons (cons key 1) acc))))
        my-alist
        :initial-value '())

What happens here? We take your data and an empty list, which will soon become our desired result. At each step we take a pair from data and ask: does our result contain info about this pair? If not, then we add it to the result and put 1—we met this key for the first time. However, if we do find info about this pair in our result, then we must again add it to our result, but this time with a number increased by 1. Repeat that process for every item in your data, and you get:

(("Alpha" . 4) ("Gamma" . 3) ("Gamma" . 2) ("Rho" . 1) ("Alpha" . 3)
 ("Beta" . 5) ("Beta" . 4) ("Beta" . 3) ("Alpha" . 2) ("Beta" . 2)
 ("Gamma" . 1) ("Beta" . 1) ("Alpha" . 1))

Remember that assoc only cares about the first occurrence of a key? This alist would behave the same as if it was just (("Alpha" . 4) ("Gamma" . 3) ("Rho" . 1) ("Beta" . 5)), so we're good here. Still, could we change our fold as to get the latter, shorter result instead? Hold on, what's the need to over-complicate our fold, if we could just tweak the result afterwards? After all, what is computer programming, if not series of data transformations? There is no reason why you couldn't just remove all the "obsolete" pairs from your alist, just use cl-remove-duplicates with correct arguments, and you're done.

So we're proud of ourselves, we wrote a fold, a pillar of functional programming, yet careful examination exposes an inefficiency: we traverse the accumulator with assoc to find a pair and its value to increment. assoc takes O(n), reduce itself takes O(n), therefore our algorithm is O(n²) (read about order of growth, if you don't understand Big-O notation). It's clear that we should better work with a proper optimized hash-table instead, and convert it to an alist when we need. Rewrite our fold:

(reduce (lambda (acc x)
          (cl-incf (gethash (car x) acc 0))
          acc)
        my-alist
        :initial-value (make-hash-table :test 'equal))

(gethash k d 0) is equivalent to Python's d.get('k', 0), where the last argument is default. cl-incf (Common Lisp equivalent incf) is a smart macro that increments its argument in-place (read about setf to understand smart assignments). make-hash-table requires custom test function, because strings can't be compared with default eql function. To get an alist, just convert the result hash-table of our fold with ht->alist function, that we either take from Wilfred's ht.el library, or write ourselves:

(defun ht->alist (table)
  (let (alist)
    (maphash (lambda (k v)
               (push (cons k v) alist))
             table)
    alist))
like image 121
Mirzhan Irkegulov Avatar answered Oct 09 '22 11:10

Mirzhan Irkegulov