How to program safely in bash

Why bash?

Bash has arrays and safe mode. When used correctly, bash is almost consistent with safe coding practices.

Fish is harder to make a mistake, but there is no safe mode. Therefore, a good idea is to prototype in fish, and then broadcast from fish to bash, if you know how to do it correctly.

Foreword

This manual accompanies ShellHarden, but the author also recommends ShellCheck so that ShellHarden rules do not diverge with ShellCheck.

Bash is not the language where the most correct way to solve a problem at the same time is the simplest . If you take the bash safe programming exam, the first rule of BashPitfalls would be: always use quotes.

The main thing you need to know about programming in bash

Manic put quotes! An unquoted variable should be regarded as a cocked bomb: it explodes on contact with a space. Yes, it “explodes” in the sense of dividing a string into an array . In particular, variable extensions like $var and command substitutions like $(cmd) undergo word splitting when the internal string expands into an array due to splitting in the $IFS special variable with a default space. This is usually invisible, because most often the result is an array of 1 element, indistinguishable from the expected string.

Not only that, but wildcards ( *? ) *? expanding as well. This process occurs after word splitting, so that if there is at least one wildcard in the word, the word becomes a group pattern that applies to any suitable file path. So this feature is starting to be applied to the file system!

Quoting suppresses both word splitting and template expansion for command variables and substitutions.

Variable Expansion:

Good: "$my_var"
Bad: $my_var

Command substitution:

Good: "$(cmd)"
Bad: $(cmd)

There are exceptions with optional quotes, but quotes never hurt, and the general rule is to be afraid of unquoted variables, so for the sake of your good, we will not look for border exceptions. It looks wrong, and the wrong practice is common enough to arouse suspicion: a lot of scripts have been written with broken processing of file names and spaces in them ...

ShellHarden only mentions a few exceptions - are these variables with numerical contents, such as $? , $# and ${#array[@]} .

Do I need to use reverse checkboxes?

Command substitutions can also have the following form:

Correct: "`cmd`"
Bad: `cmd`

Although this style can be used correctly, it looks less comfortable in quotes and less readable with nesting. The consensus is pretty clear here: avoid it.

ShellHarden rewrites such check marks in the form of brackets in dollars.

Do I need to use braces?

The brackets are used to interpolate strings, so they are usually redundant:

Bad: some_command $arg1 $arg2 $arg3
Bad and verbose: some_command ${arg1} ${arg2} ${arg3}
Good, but verbose: some_command "${arg1}" "${arg2}" "${arg3}"
Good: some_command "$arg1" "$arg2" "$arg3"

Theoretically, always using braces is not a problem, but according to your author’s experience there is a strong negative correlation between unnecessary use of braces and correct use of quotes - almost everyone chooses “bad and wordy” instead of “good, but wordy” form!

Theories of your author:

Because of the fear of doing something wrong: instead of real danger (no quotes), newbies may worry that the $prefix variable will cause an expansion of the "$prefix_postfix" variable, but this is not how it works.
Cargo cult: writing code on the covenant of the wrong fear that preceded it.
The brackets compete with quotes for the limit of permissible verbosity.

Therefore, it was decided to ban unnecessary curly braces: ShellHarden replaces these options with the simplest good form.

And now about string interpolation, where braces are really useful:

Bad (concatenation): $var1"more string content"$var2
Good (concatenation): "$var1""more string content""$var2"
Good (interpolation): "${var1}more string content${var2}"

Concatenation and interpolation in bash equivalents even in arrays (which is ridiculous).

Since ShellHarden does not format styles, it is not supposed to change the correct code. This is true for the “good (interpolation)” option: from ShellHarden's point of view this will be the canonically correct form.

Now ShellHarden adds and removes braces as needed: in a bad example, var1 is supplied with brackets, but they are not allowed for var2 even in the case of “good (interpolation)”, since they are never needed at the end of the line. The latter requirement may well be canceled.

Got: Numbered Arguments

Unlike the variable names of a normal identifier (in regex: [_a-zA-Z][_a-zA-Z0-9]* ), the numbered arguments require parentheses (interpolation of lines does not require). ShellCheck says:

 echo "$10" ^-- SC1037: Braces are required for positionals over 9, eg ${10}.

ShellHarden refuses to fix it (considers too thin a difference).

Since brackets are allowed up to 9, ShellHarden allows them for all numbered arguments.

Use arrays

To be able to quote all variables, you must use real arrays, rather than space-separated pseudo-massive strings.

The syntax is verbose, but you have to cope. This bashism is only one reason to refuse POSIX compatibility for most shell scripts.

Good:

 array=( a b ) array+=(c) if [ ${#array[@]} -gt 0 ]; then rm -- "${array[@]}" fi

Poorly:

 pseudoarray=" \ a \ b \ " pseudoarray="$pseudoarray c" if ! [ "$pseudoarray" = '' ]; then rm -- $pseudoarray fi

That is why arrays are so basic to the shell: command arguments are fundamentally arrays (and shell scripts are commands and arguments). It can be said that a shell that artificially makes impossible the transmission of several arguments will be comical and useless. Some common shells from this category include Dash and Busybox Ash. These are minimal POSIX-compatible shells — but what good is compatibility if the most important material is not on POSIX?

Exceptional cases when you are really going to break a string.

An example with \v as a data separator (note the second occurrence):

 IFS=$'\v' read -d '' -ra a < <(printf '%s\v' "$s") || true

This way we avoid extending the template, and the way works even if the data separator is \n . The second occurrence of the data separator protects the last element if it turns out to be a space. For some reason, the -d option should be the first to go, so it -rad '' tempting to link parameters to -rad '' , but it won't work. Since in this case read returns a non-zero value, it should be protected from errexit ( || true ), if it is included. Tested in bash 4.0, 4.1, 4.2, 4.3 and 4.4.

Alternative for bash 4.4:

 readarray -td $'\v' a < <(printf '%s\v' "$s")

How to start a bash script

From something like this:

 #!/usr/bin/env bash if test "$BASH" = "" || "$BASH" -uc "a=();true \"\${a[@]}\"" 2>/dev/null; then # Bash 4.4, Zsh set -euo pipefail else # Bash 4.3 and older chokes on empty arrays with set -u. set -eo pipefail fi shopt -s nullglob globstar

It includes:

Shebang:
- Portability issues: The absolute path to env probably better for portability than the absolute path to bash . You can look at the example of NixOS . POSIX requires an env , but not bash.
- Security issues: for no language, options like -euo pipefail will be favorably accepted! This becomes impossible when using the env redirect, but even if your shebang starts with #!/bin/bash , this is not the place for parameters that affect the value of the script, because they can be redefined, making it possible for the script to run incorrectly. However, as a bonus, options that do not affect the script value, such as set -x , if used, can be made redefined.
What we need from an unofficial strict mode Bash , with a check of the features set -u . We do not need all the strict Bash mode, because shellcheck / shellharden compatibility means quoting everything and everything, which is already much stricter. In addition, the set -u option should not be used in Bash 4.3 and earlier versions. Since this option regards empty arrays as dropped in those versions, arrays cannot be used for the purposes described here. Using arrays is the second most important tip from this manual (after quotes) and the only reason we sacrifice compatibility with POSIX is why this is unacceptable: either do not use set -u , or use Bash 4.4 or another normal shell like Zsh. This is easier said than done, because there is a chance that someone still runs your script in the old version of Bash. Fortunately, everything that works with set -u will work without it (for set -e this can not be said). This is why it is important to use version checking. Beware of the assumption that testing and development take place in a shell compatible with Bash 4.4 (so that the set -u aspect is tested). If it bothers you, then another option is to refuse compatibility (a script fails when a version check fails), or to abandon set -u .
shopt -s nullglob makes it work correctly for f in *.txt if *.txt does not find files. The default behavior (aka passglob ) transmits the pattern unchanged, which in the case of a zero result is dangerous for several reasons. For globstar, this activates a recursive substitution. Substitution is easier to use than find . So use it.

But not:

 IFS='' set -f shopt -s failglob

Setting the internal field separator to an empty string will prevent word splitting. Sounds like the perfect solution. Unfortunately, this is an incomplete substitution for stating variables and command substitutions, and since you are going to use quotes, this does not work. The reason why quotes are still to be used is that otherwise, empty strings become empty arrays (as in test $x = "" ) and an indirect extension of the template is still possible. Moreover, problems with this variable will also cause problems with commands that use it, like read , which breaks constructions like cat /etc/fstab | while read -r dev mnt fs opt dump pass; do echo "$fs"; done' cat /etc/fstab | while read -r dev mnt fs opt dump pass; do echo "$fs"; done' cat /etc/fstab | while read -r dev mnt fs opt dump pass; do echo "$fs"; done' .
The pattern extension is disabled: not only the infamous indirect extension, but also the seamless direct, which I said you should use. So it's hard to accept. And it is also completely optional for a shellcheck / shellharden compatible script.
Unlike nullglob , failglob fails with zero result. Although for most commands this makes sense, for example, rm -- *.txt (because for most commands it is still not expected to be executed with zero result), obviously, you can only use failglob if you do not expect a zero result. This means that usually you will not place group patterns in the arguments of a command unless you assume the same thing. But what can always happen is using nullglob and extending the template to zero arguments in constructions that can accept them, such as a loop or assigning values to an array ( txt_files=(*.txt) ).

How to complete the bash script

The exit status of the script is the status of the last command executed. Make sure it represents real success or failure.

The worst thing is to leave the solution unbound to the condition in the form of an AND list at the end of the script. If the condition is false, then the last command executed will be the condition itself.

For errexit, the conditions in the form of an AND list are never used in the first place. If errexit is not used, consider error handling even for the last command, so its exit status will not be masked if additional code is added to the script.

Poorly:

 condition && extra_stuff

Good (errexit option):

 if condition; then extra_stuff fi

Good (error handling):

 if condition; then extra_stuff || exit fi exit 0

How to use errexit

Like set -e .

Delayed cleanup at program level

If errexit works as it should, use this to install any necessary cleanup on exit.

 tmpfile="$(mktemp -t myprogram-XXXXXX)" cleanup() { rm -f "$tmpfile" } trap cleanup EXIT

Got: errexit ignored in command arguments

Here is a very tricky branching "bomb", the understanding of which was dear to me. My build script worked fine on different development machines, but put the build server on my knees:

 set -e # Fail if nproc is not installed make -j"$(nproc)"

Correctly (substitution of a command in a task):

 set -e # Fail if nproc is not installed jobs="$(nproc)" make -j"$jobs"

Warning: the local and export built-in commands remain commands, so this is still incorrect:

 set -e # Fail if nproc is not installed local jobs="$(nproc)" make -j"$jobs"

ShellCheck only warns about special commands like local in this case.

To use local , separate the declaration from the task:

 set -e # Fail if nproc is not installed local jobs jobs="$(nproc)" make -j"$jobs"

Got: errexit ignored depending on caller's context

Sometimes POSIX is terrible. Errexit is ignored in functions, group commands, and even subshells, if the caller verifies its success. All these examples print Unreachable and Great success , however strange it may seem.

Subshell:

 ( set -e false echo Unreachable ) && echo Great success

Group team:

 { set -e false echo Unreachable } && echo Great success

Function:

 f() { set -e false echo Unreachable } f && echo Great success

Because of this, bash with errexit is practically unsuitable for building: yes, it is possible to wrap the errexit functions so that they work, but there are doubts that the saved effort (on explicit error handling) is worth it. Instead, consider splitting into fully autonomous scripts.

How to avoid calling the shell with incorrect quotes

When calling a command from other programming languages, the easiest way is to err and implicitly invoke a shell. If this shell command is static, then fine - it either works or not. But if your program somehow processes the lines for building this command, you need to understand - you are generating a shell script ! I rarely want to do this, and it’s quite tiring to arrange everything correctly:

quote every argument;
escape the corresponding characters in the arguments.

Regardless of what programming language you do, there are at least three ways to build a command correctly. In order of preference:

Plan A: Do Without a Shell

If this is just a command with arguments (that is, no shell functions like pipelining or redirection), then choose an array option.

Bad (python3): subprocess.check_call('rm -rf ' + path)
Good (python3): subprocess.check_call(['rm', '-rf', path])

Bad (C ++):

 std::string cmd = "rm -rf "; cmd += path; system(cmd);

Good (C / POSIX) minus error handling:

 char* const args[] = {"rm", "-rf", path, NULL}; pid_t child; posix_spawnp(&child, args[0], NULL, NULL, args, NULL); int status; waitpid(child, &status, 0);

Plan B: static shell script

If a wrapper is required, let the arguments be the arguments. You might think that it was cumbersome to write a special shell script in your own file and access it until you see this trick:

Bad (python3): subprocess.check_call('docker exec {} bash -ec "printf %s {} > {}"'.format(instance, content, path))
Good (python3): subprocess.check_call(['docker', 'exec', instance, 'bash', '-ec', 'printf %s "$0" > "$1"', content, path])

Can you notice the shell script?

That's right, the printf command with redirection. Note the correctly quoted numbered arguments. Implementing a static shell script is fine.

These examples run in Docker, because otherwise they won't be as helpful, but Docker is also a great example of a command that runs other commands based on arguments. Unlike Ssh, as we will see later.

Last option: processing lines

If it should be a string (for example, because it should work via ssh ), then it cannot be bypassed. You have to quote each argument and escape any characters needed to exit these quotes. The simplest is the transition to single quotes, because they have the simplest screening rules. Only one rule: ' → '\" .

Typical filename in single quotes:

 echo 'Don'\''t stop (12" dub mix).mp3'

How to use this trick to safely execute ssh commands? It's impossible! Well, here is the “often correct” solution:

The “often correct” solution (python3): subprocess.check_call(['ssh', 'user@host', "sha1sum '{}'".format(path.replace("'", "'\\''"))])

We have to merge all the arguments into a string ourselves so that Ssh does not do it wrong: if you try to pass several ssh arguments, it will treacherously combine the arguments without quotes.

The reason why this is usually not possible is that the right decision depends on the preferences of the user at the other end, namely the remote shell, which can be anything. In principle, it may even be your mom. It is often “correct” to assume that the remote shell is bash or another POSIX-compatible shell, but fish is incompatible at this stage .

Source: https://habr.com/ru/post/413117/

All Articles