Working with arrays in bash

Programmers regularly use bash to solve many problems associated with software development. At the same time, bash-arrays are often considered one of the most obscure features of this command shell (arrays are probably inferior in this regard only to regular expressions). The author of the material, the translation of which we are publishing today, invites everyone to the wonderful world of bash arrays, which, if used to their unusual syntax, can bring a lot of benefit.

The real problem in which bash arrays come in handy

Writing about bash is ambiguous. The fact is that articles about bash often turn into user guides, which are devoted to stories about the syntactic features of the commands in question. This article is written differently, we hope you will not find it to be another “user manual”

Given the above, let's imagine a real scenario of using arrays in bash. Suppose you have a task to evaluate and optimize the utility from the new internal toolkit used in your company. In the first step of this study, you need to test it with different sets of parameters. The test is aimed at studying how the new set of tools behaves when it uses a different number of threads. For simplicity, we assume that the “toolbox” is a “black box” compiled from C ++ code. When used, the only parameter we can influence is the number of threads reserved for data processing. Calling the system from the command line looks like this:

./pipeline --threads 4

The basics

First, we declare an array containing the values of the --threads parameter with which we want to test the system. This array looks like this:

 allThreads=(1 2 4 8 16 32 64 128)

In this example, all elements are numbers, but, in fact, numbers and strings can be stored in bash arrays simultaneously. For example, it is quite possible to declare such an array:

 myArray=(1 2 "three" 4 "five")

As with other bash variables, note that there are no spaces around the = sign. Otherwise, bash will consider the variable name the name of the program that it needs to execute, and = its first argument!

Now that we have initialized the array, let's extract some elements from it. Here you can see, for example, that the echo $allThreads will only output the first element of the array.

In order to understand the reasons for this behavior, let's digress from the arrays a bit and recall how to work with variables in bash. Consider the following example:

 type="article" echo "Found 42 $type"

Suppose there is a $type variable that contains a string representing the noun. After this word you need to add the letter s . However, one cannot simply add this letter to the end of the variable name, since this will turn the variable call command into $types , that is, we will work with a completely different variable. In this situation, you can use a construction like echo "Found 42 "$type"s" . But it's best to solve this problem using curly braces: echo "Found 42 ${type}s" , which allows us to tell bash where the variable name begins and ends (what is interesting, the same syntax is used in JavaScript ES6 to embed variables expressions in pattern strings ).

Now back to the arrays. It turns out that although curly brackets are usually not needed when working with variables, they are necessary for working with arrays. They allow you to specify indexes for accessing array elements. For example, a command like echo ${allThreads[1]} will print the second element of the array. If you forget about curly brackets in the above construction, bash will take [1] as a string and handle what comes out accordingly.

As you can see, arrays in bash have a strange syntax, but at least the numbering of elements in them starts from zero. This makes them related to arrays from many other programming languages.

Ways to access array elements

In the above example, we used integer indices in arrays, which are given explicitly. Now consider two more ways to work with arrays.

The first method is applicable if we need the $i -th element of the array, where $i is a variable containing the index of the desired element of the array. You can extract this element from an array using a construct of the form echo ${allThreads[$i]} .

The second method allows you to display all the elements of the array. It consists in replacing the numeric index with the @ symbol (it can be taken as a command indicating all the elements of the array). It looks like this: echo ${allThreads[@]} .

Enumerating array elements in cycles

The above principles of working with elements of arrays will be useful to us for solving the problem of enumerating the elements of an array. In our case, this means launching the pipeline command under investigation with each of the values, which symbolizes the number of threads and is stored in an array. It looks like this:

 for t in ${allThreads[@]}; do ./pipeline --threads $t done

Enumerating array indices in cycles

We now consider a slightly different approach to sorting arrays. Instead of iterating over the elements, we can iterate through the array indices:

 for i in ${!allThreads[@]}; do ./pipeline --threads ${allThreads[$i]} done

We will analyze what is happening here. As we have already seen, a ${allThreads[@]} structure is all elements of an array. When adding an exclamation point here, we turn this construction into ${!allThreads[@]} , which leads to the fact that it returns array indices (from 0 to 7 in our case).

In other words, the for loop for through all array indices represented as $i variable, and in the body of the loop, the array elements that serve as the --thread parameter are --thread using the ${allThreads[$i]} construction.

Reading this code is more complicated than the one given in the previous example. Therefore, the question arises of why all these difficulties. But we need this because, in some situations, when processing arrays in cycles, we need to know both the indices and the values of the elements. For example, if the first element of an array needs to be skipped, iterating over the indices will relieve us, for example, from the need to create an additional variable and from incrementing it in a loop to work with the elements of the array.

Filling arrays

So far, we have investigated the system by invoking the pipeline command with passing to it every value of the parameter --threads interest to us. Now suppose that this command gives the duration of the execution of a certain process in seconds. We would like to intercept the data returned to it at each iteration and save it in another array. This will give us the opportunity to work with the stored data after all the tests have ended.

Useful syntaxes

Before we talk about how to add data to arrays, consider some useful syntaxes. First, we need a mechanism for obtaining data output by bash commands. In order to capture the output of the command, you need to use the following construction:

 output=$( ./my_script.sh )

After executing this command, what the myscript.sh script myscript.sh will be stored in the $output variable.

The second construction, which will come in handy very soon, allows us to attach new data to the arrays. It looks like this:

 myArray+=( "newElement1" "newElement2" )

The solution of the problem

Now, if we put together everything that we have just studied, we can create a script for testing the system, which executes the command with each of the parameter values from the array and stores in another array what this command displays.

 allThreads=(1 2 4 8 16 32 64 128) allRuntimes=() for t in ${allThreads[@]}; do runtime=$(./pipeline --threads $t) allRuntimes+=( $runtime ) done

What's next?

We have just considered how to use bash-arrays to search for parameters used when running a certain program and to save the data that this program returns. However, the use of arrays is not limited to this scenario. Here are a couple of examples.

Problem Alerts

In this scenario, we consider an application that is broken into modules. Each of these modules has its own log file. We can write a script for the cron job, which, if there are problems in the corresponding log file, will notify you by email of who is responsible for each of the modules:

 #  -    logPaths=("api.log" "auth.log" "jenkins.log" "data.log") logEmails=("jay@email" "emma@email" "jon@email" "sophia@email") #         for i in ${!logPaths[@]}; do log=${logPaths[$i]} stakeholder=${logEmails[$i]} numErrors=$( tail -n 100 "$log" | grep "ERROR" | wc -l ) #       5  if [[ "$numErrors" -gt 5 ]]; then   emailRecipient="$stakeholder"   emailSubject="WARNING: ${log} showing unusual levels of errors"   emailBody="${numErrors} errors found in log ${log}"   echo "$emailBody" | mailx -s "$emailSubject" "$emailRecipient" fi done

API Requests

Suppose you want to collect information about which users comment on your publication on Medium. Since we do not have direct access to the database of this site, we will not discuss SQL queries. However, various APIs can be used to access this type of data.

In order to avoid long conversations about authentication and tokens, we will use the open- source testing-oriented JSONPlaceholder API as the endpoint. After receiving a publication from the service and pulling data from the code to the commentators' email addresses from the code, we can put this data into an array:

 endpoint="https://jsonplaceholder.typicode.com/comments" allEmails=() #   10  for postId in {1..10}; do #    API       response=$(curl "${endpoint}?postId=${postId}") #  jq   JSON       allEmails+=( $( jq '.[].email' <<< "$response" ) ) done

Note that the jq tool is used here, which allows parsing JSON on the command line. We will not go into the details of working with jq here, if this tool is interesting for you - look at the documentation on it.

Bash or Python?

Arrays are useful and available not only in bash. The one who writes scripts for the command line may have a logical question about the situations in which it is worth using bash, and in which, for example, Python.

In my opinion, the answer to this question lies in how much a programmer depends on a particular technology. For example, if the task can be solved directly on the command line, then nothing prevents the use of bash. However, if, for example, the script you are interested in is part of a project written in Python, you may well use Python.

For example, to solve the problem discussed here, you can use a script written in Python, however, this will be reduced to writing in Python wrappers for bash:

 import subprocess all_threads = [1, 2, 4, 8, 16, 32, 64, 128] all_runtimes = [] #         for t in all_threads: cmd = './pipeline --threads {}'.format(t) #   subprocess   ,    p = subprocess.Popen(cmd, stdout=subprocess.PIPE, shell=True) output = p.communicate()[0] all_runtimes.append(output)

Perhaps solving this problem with bash, without using other technologies, is shorter and clearer, and here it is quite possible to do without Python.

Results

In this material, we have dismantled a lot of structures used to work with arrays. Here is a table in which you will find what we have reviewed and something new.

Syntax	Description
`arr=()`	Create empty array
`arr=(1 2 3)`	Array initialization
`${arr[2]}`	Getting the third element of the array
`${arr[@]}`	Getting all the elements of an array
`${!arr[@]}`	Getting array indices
`${#arr[@]}`	Array Size Calculation
`arr[0]=3`	Overwriting the first element of the array
`arr+=(4)`	Attaching to an array of value
`str=$(ls)`	Save `ls` line output
`arr=( $(ls) )`	Saving the output of the `ls` as an array of file names
`${arr[@]:s:n}`	Receiving array elements starting from element with index `s` to element with index `s+(n-1)`

At first glance, bash-arrays may seem rather strange, but the opportunities they give are worth dealing with these oddities. We believe that having mastered bash-arrays, you will use them quite often. It is easy to imagine countless scenarios in which these arrays can be useful.

Dear readers! If you have interesting examples of using arrays in bash scripts, please share them.