I need to do an array search so many time in a bash process. I need to know what is the most quick and efficient way to do it. I know how to do it. The point of the question os how to do it in the fastest way. Now, I'm doing in this way:
#!/bin/bash
array_test=("text1" "text2" "text3" "text4" "text5")
text_to_search="text4"
START=$(date +%s.%N)
for item in "${array_test[@]}"; do
if [ ${item} = "${text_to_search}" ]; then
echo "found!!"
break
fi
done
END=$(date +%s.%N)
DIFF=$(echo "$END - $START" | bc)
echo $DIFF
With this code we can measure the time.
Imagine we have 300 items or more in array. Is there a quicker way? I need to increase performance. Thank you.
EDIT I'm using bash 4.2 . The real array has line breaks:
array_test=(
"text1"
"text2"
"text3"
"text4"
"text5"
)
Use grep -qFx
in array[*]
.-q
suppresses output and exists on the first match.-F
searches fixed strings instead of regexes.-x
searches whole lines instead of substrings. This ensures that the search string b
doesn't match the element abc
.
if ( IFS=$'\n'; echo "${array[*]}" ) | grep -qFx "$text_to_search"; then
echo "found!"
fi
Assumption: Neither the array nor the text to search contain linebreaks.
It's also possible to search texts with linebreaks, as long as there is a known unused character which can be stored inside a bash variable. I found \x1E
(the ASCII control character »Record Separator«) to work out quite well. The adapted version looks as follows:
export d=$'\x1E' # unused character, here "Record Seperator"
if ( IFS="$d"; echo "$d${array[*]}$d" ) | grep -qF "$d$text_to_search$d"; then
echo "found!"
fi
In theory you might be able to speed this up even further by using multiple parallel grep
on slices of the array. However, this is so fast (see results below) that you probably won't ever encounter a scenario where the parallel search pays off.
I used an array of size 1'000'000, which was generated as follows:
size=$((10 ** 6))
array_test=($(seq -f 'text%.0f' 1 "$size"))
(By the way: using {1..1000000}
was magnitudes slower than seq
)
The search pattern was the last entry of said array
text_to_search="text$size"
Three search methods were tested
for
loopprintf %s\\n "array[@]" | grep -qFx
(IFS=$'\n'; echo "array[*]";) | grep -qFx
With the following results:
If you deeply care about performance, perhaps (a) Bash is not the best tool and (b) you should try different approaches and test them on your data. This being said, maybe associative arrays could help you.
Try this :
#!/bin/bash
declare -A array_test()
array_test=(["text1"]="" ["text2"]="" ["text3"]="" ["text4"]="" ["text5"]="")
text_to_search="text4"
if
[[ ${array_test[$text_to_search]+found} ]]
then
echo "Found!"
fi
Please note in this case I build the associative arrays with keys but empty values (no real use in setting the value to the same thing as the key and eating up more memory).
Another approach would be to sort your array, and use some kind of binary search. That would, of course, involve more code, and if Bash associative arrays are implemented efficiently, is likely to be slower. But, again, nothing beats testing on actual data to validate performance hypotheses.
If you have an associative arrays with information in the keys, you can use an expansion to use them the same way you would use values :
for key in "${!array[@]}"
do
do_something_with_the key
done
Also, you can build your array with a loop, which would be useful if you read your elements from a file or from the output of a command. As an example :
declare -A array_test=()
while IFS= read -r key
do
array_test[$key]=
done < <(command_with_output)
It is useful to note that an element set to a null (empty) value is not the same as an unset element. You can easily see that with the following expansion :
declare -A array_test=()
array_test[existing_key]=
echo ${array_test[existing_key]+found} # Echoes "found"
echo ${array_test[missing_key]+found} # Echoes nothing
The "${var+value}"
expansion use useful because it expands to nothing if the variable is unset, and to value
if it is set. It does not produce an error when using set -u
that catches attempts to expand unset variables.
If none of your array elements contain whitespace, you can use pattern matching
array_test=("text1" "text2" "text3" "text4" "text5")
text_to_search="text4"
[[ " ${array_test[*]} " == *" $text_to_search "*]] && echo found || echo not found
The quotes and spaces are very important there.
In bash, within [[ ]]
double brackets, the ==
operator is a pattern matching operator, not just string equality.
A concrete example based on useful @Fred answer using associative array:
script.sh
#!/bin/bash
read -a array_test <<< $(seq 1 10000)
text_to_search="9999"
function from_associative_array {
declare -A array
for constant in ${array_test[@]}
do
array[$constant]=1
done
[[ ${array[$text_to_search]} ]] && echo "Found in associative array";
}
function from_indexed_array {
for item in "${array_test[@]}"; do
if [ ${item} = "${text_to_search}" ]; then
echo "Found in indexed array"
break
fi
done
}
time from_indexed_array
time from_associative_array
$ bash script.sh
Found in indexed array
real 0m0.611s
user 0m0.604s
sys 0m0.004s
Found in associative array
real 0m0.297s
user 0m0.296s
sys 0m0.000s
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With