# How to debug /bin/sh shell scripts ?



## Alain De Vos (May 19, 2021)

1.You can read the code.
2.You can start with "-v"
3.You can start with "-x"
Other ideas ?


----------



## SirDice (May 19, 2021)

Lots of `echo "Debug: somevariable=${somevariable}"` thrown into the code to keep track of the state of variables. But usually `-x` does the trick.


----------



## zirias@ (May 19, 2021)

`echo`


----------



## _martin (May 19, 2021)

All above plus `set -u` helps with the typos in the variable names.


----------



## ralphbsz (May 19, 2021)

That's one of the hardest debugging jobs. Personally, my answer is: don't. Use shell only for scripts that they are so short that they obviously have no bugs. And that are so short-lived, after they've done their job (about 3 minutes in), you delete them; like that they can't grow bugs in the future. Why would working code grow bugs? Usually because it gets used again, but for a slightly different problem, one it wasn't really designed for.

Failing that, debugging shells scripts requires discipline. And I've worked with systems that have a quarter million lines of sh code, with the single largest one (single file!) being 18K lines. What you do is to break things apart into tiny little bits. Each shell function is a dozen or three dozen lines long. It is really well documented, with comments that describe exactly what it does, what the legal inputs are, and what it will output (return, or do as a side effect). Then you build a test battery, which is typically 3x longer than the shell code you are writing, which exercises each little function: make sure it can accept a wide variety of valid inputs, and make sure it cleanly rejects invalid input. Make sure it has the correct output for valid inputs. Ideally, the test battery is written by an independent person, who is an adversary, and tries to find bugs. Typically, for every software engineer (script writer), you need to hire two testers / test engineer. Then you need some test automation. That can for example be a script that runs all the tests, one at a time, and makes sure none fail.

One thing that really helps is being super consistent. For example, have one set of environment variables that turns debugging off and on, with relatively fine granularity. Make sure temporary files are in a consistent place, but with names that consistently are different, so different pieces of code can't step on each other's files. Have clear naming convention for variables. Make sure no code pollutes global name spaces: If a function sets a temporary variable, it must unset it before existing (set | wc -l, check before and after). Think of scripting as working on a workbench: After every operation, you must put the tools back where they belong, you must put all garbage in the trash, you must leave the work surface as clean as you found it, except perhaps with one more intermediate product stacked in the back.

Make a very clear list of available tools: We will run on V7 Bourne shell, using SysV awk, and BSD sed and grep, and absolutely nothing else (or whatever tools you have available). Then make sure ONLY those tools are available on the path. If the job needs something else (like gawk), the job simply won't get done. Or you declare gawk to be the new standard.

Now you have a battery of well tested small script parts. The rest is to assemble them into larger and larger towering edifices. That's doable by understanding the requirements of the job.


----------



## kpedersen (May 19, 2021)

set -e

Very handy so it aborts as soon as something returns non-zero exit code.


----------



## gpw928 (May 20, 2021)

ralphbsz said:


> Use shell only for scripts that they are so short that they obviously have no bugs.


One of my favourite quotes is by C.A.R. Hoare who said "There are two methods in software design. One is to make the program so simple, there are obviously no errors. The other is to make it so complicated, there are no obvious errors.".

I agree with most of your sentiments, though suspect that very few people today really understand what V7 shell actually is. 

I also feel compelled to mention Edsger W. Dijkstra: "Program testing can best show the presence of errors but never their absence.".


----------



## ralphbsz (May 20, 2021)

kpedersen said:


> set -e
> 
> Very handy so it aborts as soon as something returns non-zero exit code.



You have to be very careful with that one. It makes the following idiom not possible:

```
ls -l foo > /tmp/ls.foo.$$ 2> /dev/null # Example of doing something with a file
if [ $? -eq 0 ]
then
  echo Foo already exists.
  # Do something useful with the output of ls, like cover it with chocolate.
else
  echo Foo does not exist.
  # Do something else. Like create an empty foo, and proceed directly to wine, skipping the chocolate.
fi
```
Because when the command foo "fails" (in the sense of returning a non-zero exit code), the whole script blows up. Sure, it's "easy" to code around this, but all these easy ways are either more complex or not idiomatic.

And note that "failures" of shell commands can happen many places, for example buried in pipes. For example, one can use "grep -q > /dev/null" to check whether something contains a certain string ... but it requires expecting the command to fail.


----------



## gpw928 (May 20, 2021)

kpedersen said:


> set -e


I have been programming in shell for nearly 40 years, and never used "set -e".  I could never find a good use for it.

Catching and dealing with errors in a timely and elegant fashion is one of the things that makes good programming difficult.

I think that "set -e" falls into the same category as Java programmers who view a stack trace as the answer to any run-time error condition.  Slovenly comes to mind.


----------



## kpedersen (May 20, 2021)

I use set -e mainly so I can treat my shell scripts like Makefiles. You can see a fairly recent usage of that here.

Generally they are only a list of build instructions (i.e automating cmake, etc) that checking the return code each time would just be added noise.

As ralphbsz demonstrated it gets a little awkward if you have certain types of logic. Typically I can avoid extracting error codes like that (but I have been known to surround an occasional section with `set +e; ...; set -e` (ugly *HACK*)). Most importantly, the testing tools do still work (i.e [ -f) so I can ensure specific files / folders exist. That is usually the extent of my use. My scripts rarely run programs if I know they might fail (i.e as tests).

Typically I do want errors to cause the script to stop hard and fast. If I need more complex, I tend to use Awk and very much over-engineer it instead!


----------



## Hakaba (May 20, 2021)

I personally use two strategies.
The first one : I use function and test it before using it in the main loop.
The second one is multiple script with IO that I can pipe.


----------



## SirDice (May 20, 2021)

devel/hs-ShellCheck is good to use too. While not strictly used for debugging it does point out common mistakes and dubious code. Getting it to conform usually _prevents_ errors from occurring. Not guaranteed of course because it cannot and will not point out logic errors for example. But clean code is typically easier to debug.


----------



## debguy (May 29, 2021)

depends on which shell.  tcsh has '-n' and bash has ? --debugger (i've never tried it)

I use -e

the use of {} is better than "If ... then" and pretty portable

I once wrote a script that reads 1 line of a shell script at a time, uses "sh -x" on it, waits for keypress, but only worked if there was no wrapping (ie, not even if-then).  I never use it I always think of a way not to use it.

Install source for your shell, put a debug line in your shell, such as to write commands and line numbers to a file.  What you want to see in that file is "so custom" that it shouldn't be a feature of the shell.  Another hack might be to have bash wait for keypress and print line numbre:  However many scripts involve re-direction so you might not have a console.  Debugging /etc/rc.d/rcS in linux is fun that way - you get to reboot if your wrong (it contained allow of obstruction rules that stopped booting or shutting down based on no real need for obstruction).


----------



## debguy (May 29, 2021)

I wrote a script called  check_balance

it calls
check_single_quote
check_double_quote
check_brace
check_bracket

it's not sophisticated but I do occasionally use it


----------

