Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clojure: Aggregate and Count in Maps

Tags:

clojure

monger

I guess this question qualifies as an entry-level clojure problem. I basically have troubles processing a clojure map multiple times and extract different kinds of data.

Given a map like this, I'm trying to count entries based on multiple nested keys:

[
  {
    "a": "X",
    "b": "M",
    "c": 188
  },
  {
    "a": "Y",
    "b": "M",
    "c": 165
  },
  {
    "a": "Y",
    "b": "M",
    "c": 313
  },
  {
    "a": "Y",
    "b": "P",
    "c": 188
  }
]

First, I want to group the entries by the a-key values:

{
  "X" : [
    {
      "b": "M",
      "c": 188
    }
  ],
  "Y" : [
    {
      "b": "M",
      "c": 165
    },
    {
      "b": "M",
      "c": 313
    },
    {
      "b": "P",
      "c": 188
    }
  ]
}

Second, I want to assume values of b-keys as duplicates and ignore the remaining keys:

{
  "X" : [
    {
      "b": "M"
    }
  ],
  "Y" : [
    {
      "b": "M"
    }
    {
      "b": "P"
    }
  ]
}

Then, simply count all instances of the b-key:

{
  "X" : 1,
  "Y" : 2
}

As I'm getting the data through monger, I defined:

(defn db-query
  ([coll-name]
     (with-open [conn (mg/connect)]
       (doall (mc/find-maps (mg/get-db conn db-name) coll-name))))

and then hitting the roadblock:

(defn get-sums [request]
  (->> (db-query "data")
       (group-by :a)
       (into {})
        keys))

How could I continue from here?

like image 735
frhd Avatar asked Mar 13 '23 20:03

frhd


2 Answers

This is a naive approach, I am sure there are better ways but it might be what you need to figure it out.

(into {}
  (map       

    ; f       
    (fn [ [k vs] ] ;[k `unique count`]
      [k (count (into #{} (map #(get % "b") vs)))]) 

    ; coll
    (group-by #(get % "a") DATA))) ; "a"s as keys
;user=> {"X" 1, "Y" 2}

Explanation:

; I am using your literal data as DATA, just removed the , and ;
(def DATA [{...

(group-by #(get % "a") DATA) ; groups by "a" as keys
; so I get a map {"X":[{},...] "Y":[{},{},{},...]}

; then I map over each [k v] pair where
; k is the map key and
; vs are the grouped maps in a vector
(fn [ [k vs] ] 
      ; here `k` is e.g. "Y" and `vs` are the maps {a _, b, _, c _}

      ; now `(map #(get % "b") vs)` gets me all the b values
      ; `into set` makes them uniqe
      ; `count` counts them
      ; finally I return a vector with the same name `k`,
      ;   but the value is the counted `b`s
      [k (count (into #{} (map #(get % "b") vs)))]) 

; at the end I just put the result `[ ["Y" 2] ["X" 1] ]` `into` a map {}
; so you get a map
like image 180
birdspider Avatar answered Mar 19 '23 12:03

birdspider


(def data [{"a" "X", "b" "M", "c" 188}
       {"a" "Y", "b" "M", "c" 165}
       {"a" "Y", "b" "M", "c" 313}
       {"a" "Y", "b" "P", "c" 188}])
;; Borrowing data from @leetwinski

One thing you might want to consider if you're defining the data is to use keywords instead of strings as the keys. This comes with the benefit of being able to use keywords as functions to access things in the map, i.e. (get my-map "a") becomes (:a my-map).

To get the data grouped by "a" key:

(defn by-a-key [data] 
  (group-by #(get % "a") data))

I think you can actually skip your second step if it's just being used to get you to your third step as it is not needed in order to do so. On second reading I can't tell if you want to only keep one element per distinct "b" key. I'm going to assume not since you didn't specify how to pick which to retain and they appear to be substantially different.

(reduce-kv 
  (fn [m k v] 
    (assoc m k 
      (count (filter #(contains? % "b") v)))) 
  {} 
  (by-a-key data))

You could also do the whole thing like so:

(frequencies (map #(get % "a") (filter #(contains? % "b") data)))

Since you can filter by contains "b" key before grouping you can rely on the frequencies to group and count for you.

like image 45
BWStearns Avatar answered Mar 19 '23 11:03

BWStearns