We've discovered some domain names tied to infections. Now we have a list of DNS names in a .json file, and I'd like to produce a summarized output showing: a list of users, the unique domains they visited, the total count. Bonus points if I can also get count per domain name.
Here is a sample of the file:
{"machine": "possible_victim01", "domain": "evil.com", "timestamp":1435071870}
{"machine": "possible_victim01", "domain": "evil.com", "timestamp":1435071875}
{"machine": "possible_victim01", "domain": "soevil.com", "timestamp":1435071877}
{"machine": "possible_victim02", "domain": "bad.com", "timestamp":1435071877}
{"machine": "possible_victim03", "domain": "soevil.com", "timestamp":1435071879}
Ideally, I would like the output to be something like:
{"possible_victim01": "total": 3, {"evil.com": 2, "soevil.com": 1}}
{"possible_victim02": "total": 1, {"bad.com": 1}}
{"possible_victim03": "total": 1, {"soevil.com": 1}}
I would gladly settle for:
{"possible_victim01": "total": 3, ["evil.com", "soevil.com"]}
{"possible_victim02": "total": 1, ["bad.com"]}
{"possible_victim03": "total": 1, ["soevil.com"]}
I can get a total count of records per user, but I lose the list of domains:
cat sample.json | jq -s 'group_by(.machine) | map({machine:.[0].machine,domain:.[0].domain, count:length}) '
[{"machine": "possible_victim01", "domain": "evil.com", "count": 3},
{"machine": "possible_victim02", "domain": "bad.com", "count": 1},
{"machine": "possible_victim03", "domain": "soevil.com", "count": 1}]
This post describes how to solve the second half of the problem... JQ Aggregations and Crosstabs. I haven't found anything yet that describes the first half, getting to:
{"machine": "possible_victim01", "domain": "evil.com", "count":2}
{"machine": "possible_victim01", "domain": "soevil.com", "count":1}
{"machine": "possible_victim02", "domain": "bad.com", "count":1}
{"machine": "possible_victim03", "domain": "soevil.com", "count":1}
You need to to do group_by
twice, once to group by the machine name, and then a sub-grouping to get the sub-counts for each domain.
jq query:
group_by(.machine) | map({
"machine": .[0].machine,
"total":length,
"domains": (group_by(.domain) | map({
"key":.[0].domain,
"value":length}) | from_entries
)
})
Example output:
{
"machine": "possible_victim01",
"total": 3,
"domains": {
"evil.com": 2,
"soevil.com": 1
}
}
{
"machine": "possible_victim02",
"total": 1,
"domains": {
"bad.com": 1
}
}
{
"machine": "possible_victim03",
"total": 1,
"domains": {
"soevil.com": 1
}
}
Using group_by in the manner described is fine, but if you have a very large number of lines (i.e. JSON entities) to read as suggested by the sample provided, then you may run into performance issues and/or capacity constraints.
These issues can be resolved very effectively in any version of jq with the "inputs" builtin (e.g. jq 1.5rc1).
Please note that using "inputs" you would invoke jq with the -n option, like this:
jq -n -f program.jq data.json
Please note also that it is preferable here to produce JSON output, and the following seems to be close to what is wanted:
{"possible_victim01": { "total": 3, "evildoers": {"evil.com": 2, "soevil.com": 1} },
"possible_victim02": ...}`
The following program could be made more concise but the presentation here is intended to make the process transparent, assuming a basic understanding of jq. If there is magic here, it is that one does not have to make a special case of "null".
reduce inputs as $line
({};
. as $in
| ($line.machine) as $machine
| ($line.domain) as $domain
| ($in[$machine].evildoers ) as $evildoers
| . + { ($machine): {"total": (1 + $in[$machine]["total"]),
"evildoers": ($evildoers | (.[$domain] += 1)) }} )
Using the sample input provided, the output is:
{
"possible_victim01": {
"total": 3,
"evildoers": {
"evil.com": 2,
"soevil.com": 1
}
},
"possible_victim02": {
"total": 1,
"evildoers": {
"bad.com": 1
}
},
"possible_victim03": {
"total": 1,
"evildoers": {
"soevil.com": 1
}
}
}
Here is a solution using reduce, getpath and setpath
reduce .[] as $o (
{}
; [$o.machine, "total"] as $p1
| [$o.machine, "domains", $o.domain] as $p2
| setpath($p1; 1+getpath($p1))
| setpath($p2; 1+getpath($p2))
)
If filter.jq
contains this filter and data.json
contains the sample data then the command
$ jq -M -s -f filter.jq data.json
produces
{
"possible_victim01": {
"total": 3,
"domains": {
"evil.com": 2,
"soevil.com": 1
}
},
"possible_victim02": {
"total": 1,
"domains": {
"bad.com": 1
}
},
"possible_victim03": {
"total": 1,
"domains": {
"soevil.com": 1
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With