Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

nested associative arrays in bash [duplicate]

Can one construct an associative array whose elements contain arrays in bash? For instance, suppose one has the following arrays:

a=(a aa)
b=(b bb bbb)
c=(c cc ccc cccc)

Can one create an associate array to access these variables? For instance,

declare -A letters
letters[a]=$a
letters[b]=$b
letters[c]=$c

and then access individual elements by a command such as

letter=${letters[a]}
echo ${letter[1]}

This mock syntax for creating and accessing elements of the associate array does not work. Do valid expressions accomplishing the same goals exist?

like image 368
user001 Avatar asked Aug 09 '14 17:08

user001


2 Answers

I think the more straightforward answer is "No, bash arrays cannot be nested." Anything that simulates nested arrays is actually just creating fancy mapping functions for the keyspace of the (single layered) arrays.

Not that that's bad: it may be exactly what you want, but especially when you don't control the keys into your array, doing it properly becomes harder. Although I like the solution given by @konsolebox of using a delimiter, it ultimately falls over if your keyspace includes keys like "p|q". It does have a nice benefit in that you can operate transparently on your keys, as in array[abc|def] to look up the key def in array[abc], which is very clear and readable. Because it relies on the delimiter not appearing in the keys, this is only a good approach when you know what the keyspace looks like now and in all future uses of the code. This is only a safe assumption when you have strict control over the data.

If you need any kind of robustness, I would recommend concatenating hashes of your array keys. This is a simple technique that is extremely likely to eliminate conflicts, although they are possible if you are operating on extremely carefully crafted data.

To borrow a bit from how Git handles hashes, let's take the first 8 characters of the sha512sums of keys as our hashed keys. If you feel nervous about this, you can always use the whole sha512sum, since there are no known collisions for sha512. Using the whole checksum makes sure that you are safe, but it is a little bit more burdensome.

So, if I want the semantics of storing an element in array[abc][def] what I should do is store the value in array["$(keyhash "abc")$(keyhash "def")"] where keyhash looks like this:

function keyhash () {
    echo "$1" | sha512sum | cut -c-8
}

You can then pull out the elements of the associative array using the same keyhash function. Funnily, there's a memoized version of keyhash you can write which uses an array to store the hashes, preventing extra calls to sha512sum, but it gets expensive in terms of memory if the script takes many keys:

declare -A keyhash_array
function keyhash () {
    if [ "${keyhash_array["$1"]}" == "" ];
    then
        keyhash_array["$1"]="$(echo "$1" | sha512sum | cut -c-8)"
    fi
    echo "${keyhash_array["$1"]}"
}

A length inspection on a given key tells me how many layers deep it looks into the array, since that's just len/8, and I can see the subkeys for a "nested array" by listing keys and trimming those that have the correct prefix. So if I want all of the keys in array[abc], what I should really do is this:

for key in "${!array[@]}"
do
    if [[ "$key" == "$(keyhash "abc")"* ]];
    then
        # do stuff with "$key" since it's a key directly into the array
        :
    fi
done

Interestingly, this also means that first level keys are valid and can contain values. So, array["$(keyhash "abc")"] is completely valid, which means this "nested array" construction can have some interesting semantics.

In one form or another, any solution for nested arrays in Bash is pulling this exact same trick: produce a (hopefully injective) mapping function f(key,subkey) which produces strings that they can be used as array keys. This can always be applied further as f(f(key,subkey),subsubkey) or, in the case of the keyhash function above, I prefer to define f(key) and apply to subkeys as concat(f(key),f(subkey)) and concat(f(key),f(subkey),f(subsubkey)). In combination with memoization for f, this is a lot more efficient. In the case of the delimiter solution, nested applications of f are necessary, of course.

With that known, the best solution that I know of is to take a short hash of the key and subkey values.


I recognize that there's a general dislike for answers of the type "You're doing it wrong, use this other tool!" but associative arrays in bash are messy on numerous levels, and run you into trouble when you try to port code to a platform that (for some silly reason or another) doesn't have bash on it, or has an ancient (pre-4.x) version. If you are willing to look into another language for your scripting needs, I'd recommend picking up some awk.

It provides the simplicity of shell scripting with the flexibility that comes with more feature rich languages. There are a few reasons I think this is a good idea:

  • GNU awk (the most prevalent variant) has fully fledged associative arrays which can nest properly, with the intuitive syntax of array[key][subkey]
  • You can embed awk in shell scripts, so you still get the tools of the shell when you really need them
  • awk is stupidly simple at times, which puts it in stark contrast with other shell replacement languages like Perl and Python

That's not to say that awk is without its failings. It can be hard to understand when you're first learning it because it's heavily oriented towards stream processing (a lot like sed), but it's a great tool for a lot of tasks that are just barely outside of the scope of the shell.

Note that above I said that "GNU awk" (gawk) has multidimensional arrays. Other awks actually do the trick of separating keys with a well-defined separator, SUBSEP. You can do this yourself, as with the array[a|b] solution in bash, but nawk has this feature builtin if you do array[key,subkey]. It's still a bit more fluid and clear than bash's array syntax.

like image 57
sirosen Avatar answered Nov 20 '22 12:11

sirosen


This is the best non-hacky way to do it but you're only limited to accessing single elements. Using indirect variable expansion references is another but you'd still have to store every element set on an array. If you want to have some form of like anonymous arrays, you'd need to have a random parameter name generator. If you don't use a random name for an array, then there's no sense referencing it on associative array. And of course I wouldn't like using external tools to generate random anonymous variable names. It would be funny whoever does it.

#!/bin/bash

a=(a aa)
b=(b bb bbb)
c=(c cc ccc cccc)

declare -A letters

function store_array {
    local var=$1 base_key=$2 values=("${@:3}")
    for i in "${!values[@]}"; do
        eval "$1[\$base_key|$i]=\${values[i]}"
    done
}

store_array letters a "${a[@]}"
store_array letters b "${b[@]}"
store_array letters c "${c[@]}"

echo "${letters[a|1]}"
like image 38
konsolebox Avatar answered Nov 20 '22 13:11

konsolebox