Write code that is easy to remove and debug.

Simple debugging code is code that does not fool you. It is more difficult to debug code with hidden behavior, with poor error handling, with uncertainties, insufficiently or overly structured, or in the process of being modified. In fairly large projects, you end up with code that you cannot understand.

If the project is relatively old, then you can come across code that you have forgotten about, and if it were not for the commit journal, you would have sworn that these lines were not written by you. As the project grows, it becomes more difficult to remember what different pieces of code do. And the situation is aggravated if the code does not do what, it seems, must do. And when you need to change the code that you do not understand, you have to figure it out hard: debug.

The ability to write code that is easy to debug, begins with the understanding that you do not remember anything previously written.

Rule 0: Good code contains obvious errors.

Vendors of widely used technology claim that “write clear code” means “write clean code”. The problem is that the degree of "purity" is highly dependent on the context. Clean code can be hardcoded in the system, and sometimes some dirty hack is written in such a way that it is easy to disable. Sometimes the code is considered clean, because all the dirt has been pushed away. Good code is not necessarily clean.

Cleanliness more characterizes the degree of pride (or shame) that a developer feels about this code, and not ease of maintenance or change. Better instead of pure, give us a boring code, the changes in which are obvious: I found that people are more willing to refine the code base if the fruit is hanging low enough and it is easy to pick it. The best may be the code that you just looked at and immediately understood how it works.

Code that does not try to create an ugly problem to look good, or a boring problem to look interesting.
Code, the errors in which are obvious, and the behavior is clear, unlike the code without obvious errors and with unclear behavior.
The code in which it is documented, in what exactly it is not ideal, unlike the code striving for perfection.
Code with such obvious behavior that any developer can come up with a myriad of different ways to change this code.

Sometimes the code is so nasty that any attempts to make it cleaner only aggravate the situation. Writing code without an understanding of the consequences of one’s actions can also be regarded as a ritual of invoking user-friendly code.

I do not want to say that clean code is bad, but sometimes the desire for cleanliness looks more like sweeping away rubbish under the mat. Convenient debugging code will not necessarily be clean; and code stuffed with checks or error handling is rarely readable.

Rule 1: There are always problems in the computer.

In computer problems, and the program crashed during the last execution.

The application must first make sure that it starts from a known, good, safe state before attempting to do something. Sometimes there is simply no copy of the state, because the user deleted it or upgraded the computer. The program crashed during the last run and, paradoxically, at the first run too.

For example, when reading or writing a state to a file, such problems may arise:

The file is missing.
The file is damaged.
File older version or newer.
Not completed the last change in the file.
The file system is lying to you.

These problems are not new, databases have been confronted with them since ancient times (1970-01-01). Using something like SQLite will help to deal with many similar troubles, but if the program crashed during the last execution, the code can work with erroneous data and / or erroneous way.

For example, with programs running on a schedule, something from this list will happen:

The program will start twice in one hour due to daylight saving time.
The program will run twice, because the operator has forgotten that it is already running.
The program will start late due to the end of free disk space or mysterious cloud or network problems.
The program will run longer than an hour, which may lead to a delay in subsequent calls to the program.
The program will run at the wrong time of day.
The program will inevitably run shortly before any boundary time, for example, midnight, the end of a month or year, and will fail due to computational errors.

Creating a sustainable software begins with writing such software, which believes that it fell the previous time, and falls, if you do not know what to do. The best thing about throwing an exception and leaving a comment in the “it should not happen” style is that when it inevitably happens, you will have a head start for debugging your code.

The program is not even obliged to recover from a failure, it is enough to allow it to surrender and not worsen the situation. Small checks that generate exceptions can save weeks on tracking in logs, and a simple lock file can save hours on recovering from a backup.

The code that is easy to debug is:

the code, which before doing what it is asked for, checks whether everything is in order;
code that makes it easy to return to a well-known state and try again;
as well as code with levels of protection that cause errors to occur as early as possible.

Rule 2: Your program is fighting with itself.

The largest DoS attack in the history of Google came from us (because our systems are very large). Although from time to time someone tries to test us for strength, but still we can hurt ourselves more than others.

This applies to all our systems.

Astrid Atkinson, Long Game Engineer

Programs always fall during the last execution, there is always not enough processor, memory, disk space. All workers are hollowed out in an empty queue, everyone is trying to repeat a failed and long-obsolete request, and all servers simultaneously pause during garbage collection. The system is not just broken, it is constantly trying to break itself.

Even checking the operation of the system can cause great difficulties.

Implementing server operation validation can be easy, but only if it does not process requests. If you do not check the duration of continuous failure-free operation, then it is quite possible that the program falls between checks. Health checks can also initiate bugs: I’ve been able to write checks that caused the system to be protected to crash. Twice, with a difference of three months.

Error-handling code will inevitably lead to the detection of even more errors that need to be processed, many of which are due to the error handling itself. Similarly, performance optimizations are often the cause of bottlenecks in the system. An application that is pleasant to use in one tab becomes a problem, being launched in 20 copies.

Another example: a worker in the pipeline is running too fast and consumes available memory before the next part of the pipeline accesses it. This can be compared with traffic jams: they arise because of an increase in the speed of movement, and as a result, the congestion grows in the opposite direction of movement. Similarly, optimizations can generate systems that fall under high or heavy loads, often in some mysterious ways.
In other words: the faster the system, the greater the pressure on it, and if you do not allow the system to counteract a little, then do not be surprised if it cracks.

Counteraction - one of the forms of feedback system. The program, which is easy to debug, involves the user in the feedback loop, allows you to see all the behaviors inside the system, random, intentional, desired and not desired. You can easily inspect such code, see and understand the changes happening to it.

Rule 3: If you leave something ambiguous now, you will have to debug it later.

In other words, it should be easy for you to track variables in a program and understand what is happening. Take any subroutines with nightmarish linear algebra, you should strive to present the state of the program as clearly as possible. This means that in the middle of a program you cannot change the purpose of a variable, since using one variable for two different purposes - a mortal sin.

This also means that you should carefully avoid the semi-predicate problem, never use a single value ( count ) to represent a pair of values ( boolean , count ). It is necessary to avoid returning a positive number for the result and at the same time return -1 if nothing matches. The fact is that you can easily find yourself in a situation where you need something like " 0, but true " (and this is exactly the feature in Perl 5); or when you create a code that is difficult to combine with other parts of the system ( -1 for the next part of the program may not be an error, but a correct input value).

In addition to using one variable for two purposes, it is not recommended to use two variables for one goal, especially if it is Boolean. I don’t want to say that it’s bad to use two numbers to store a range, but using booleans to indicate the state of a program is often a disguised state machine.

When the state does not pass from top to bottom, that is, in the case of an episodic cycle, it is best to provide the state with its own variable and clear the logic. If inside the object you have a set of Booleans, then replace them with a variable called state and use enum (or a string, if necessary, somewhere). if will look like if state == name , not if bad_name && !alternate_option .

Even if you make an explicit state machine, there is a chance of confusing: sometimes the code can have two hidden state machines inside. Once I was tormented to write an HTTP proxy, until I made every machine explicit, I traced the connection status and parsed it separately. When you merge two state machines into one, it can be difficult to add a new state or to understand exactly what state something should have.

We are talking more about creating code that does not have to be debugged, than easy to debug. If you develop a list of correct states, it will be much easier to discard incorrect ones without accidentally missing one or two.

Rule 4: Random behavior is expected behavior.

When you do not understand what the data structure is doing, users fill these gaps in knowledge: any code behavior, intentional or accidental, will eventually rely on something. Many popular programming languages support hash tables that can be iterated, and which in most cases preserve the order after insertion.

In some languages, the behavior of the hash table meets the expectations of most users, iterating over the keys in the order of their addition. In other languages, the hash table in each iteration returns the keys in a different order. In this case, some users complain that the behavior is not random enough .

Unfortunately, any source of randomness in your program will eventually be used for statistical simulation, or even worse - cryptography; and any ordering source will be used for sorting.

In databases, some identifiers contain slightly more information than others. When creating a table, the developer can choose between different types of primary key. The correct choice is a UUID, or something indistinguishable from it. The disadvantage of the other options is that they can disclose information about ordering and identification. That is, not just a == b , but a <= b , and other options are auto-increment keys.

When using an auto-increment key, the database assigns the number to each row of the table, adding 1 each when inserting a new row. And there is a sorting obscurity: people do not know which part of the data is canonical. In other words, are you sorting by key or by timestamp? As in the case of the hash table, people themselves will choose the correct answer. And another problem is that users can easily predict adjacent records with other keys.

But any attempt to outwit UUID will fail: we have already tried to use postal codes, phone numbers and IP addresses, and each time we failed miserably. A UUID may not make your code easier to debug, but less-random behavior means less trouble.

From the keys you can extract information not only about the ordering. If in the database you create keys based on other fields, then people will discard the data and restore it from the key. And there will be two problems: when the state of the program is stored in several places, it will be very easy for copies to disagree with each other; and it will be more difficult to synchronize them if you are not sure which of them needs to be changed or which one has changed.

Whatever you allow your users to do, they will. Writing a lung in debugging code means thinking through ways to misuse it, and how people can interact with it as a whole.

Rule 5: Debugging is primarily a social task, and only then a technical one.

When a project is divided into components and systems, it can be much harder to find bugs. By understanding how the problem arises, you can coordinate changes in different parts to correct the behavior. Correction of bugs in large projects requires not so much their search, as convincing people of the existence of these bugs, or of the very possibility of existence.
There are bugs in the software, because no one is entirely sure who is responsible for what. That is, it is more difficult to debug the code when nothing is written, you have to ask everything in Slack, and no one answers until one expert comes along.

This can be remedied through planning, tools, processes, and documentation.

Planning is a way to get rid of the stress of being in constant communication, the incident management structure. Plans allow you to inform customers, free people who have been in touch for too long, and also keep track of problems and make changes to reduce future risks. Tools are a way to reduce the requirements for doing some work so that it becomes more accessible to other developers. Process - a way to remove the management functions from individual participants and transfer them to the team

People and ways of interaction will change, but the processes and tools will remain as the team transforms. It’s not that one is more important than another, but that one is created to support change in another. The process can also be used to remove control functions from a command. This is not always good or bad, but there is always some process, even if it is not registered. And the act of documenting it is the first step to allowing other people to change this process.

Documentation is more than text files. This is a way to transfer responsibility, how you introduce people to work, how you report changes to those who are affected by the changes. Writing documentation requires more empathy than writing code, and more skills: there are no simple compiler flags or type checks, and you can easily write a lot of words without documenting anything.

Without documentation, one cannot expect others to make informed decisions, or even agree on the consequences of using the software. Without documentation, tools or processes it is impossible to share the burden of maintenance, or at least replace the people who are now solving the problem.

The desire to facilitate debugging is applicable not only to the code itself, but also to the processes associated with the code, it helps to understand whose skin you need to climb in to correct the code.

Code that is easy to debug is easy to explain.

There is an opinion that if you explain a problem to someone during debugging, you understand it yourself. For this, you do not even need another person, the main thing is to force yourself to explain the situation from scratch, to explain the order of reproduction. And often this is enough to come to the right decision.

If. Sometimes, when we ask for help, we ask not what we need. This phenomenon is so common that it is called The XY Problem: How can I get the last three letters of the file name? BUT? No, I meant expansion . ”

We talk about a problem in terms of a solution that we understand, and we talk about a solution in terms of the consequences we fear. Debugging is a difficult comprehension of unexpected consequences and alternative solutions; it requires the most difficult from the programmer: to admit that he understood something wrong.

It turns out that this was not a compiler error.

Source: https://habr.com/ru/post/412693/

All Articles