On the issue of AVR and world records

Do well, bad yourself

The reason for the post was the recent (when I started to write this post, it was really recent, but something post had been in the Unfinished folder for a long time) publishing on Habré regarding aspects of the implementation of the UART software at AVR. The questions raised by themselves are not without interest, but such strange answers are given to them that they considered it their duty to make the necessary clarifications. The topic is indicated, those who want to read about "kings, cabbage and shoes", that is, the requirements of standards, reading (correct) technical documentation and records in programming in assembly language for AVR, can click on the button below.

Let us denote the question in more detail - is it possible to implement the IRPS (the interface name I used to use, nee the name UART) on the AVR type MC (specifically, it was Tiny13) when operating from an internal generator. The fact is that this generator has not very good performance in frequency hold accuracy, which is why this question arises. At once I will make a reservation that it does not matter whether we will consider the software implementation (as suggested in the original post) or use the hardware blocks of the MC. The results of one method (in terms of accuracy parameters over time) are almost completely translated to another.

The crucial question is whether the internal generator can provide the required accuracy of operation, since in the case of a negative answer to this question, further research becomes meaningless. To compare two independent values, we need to know both of them, so we will start by determining the required accuracy of frequency hold and the capabilities provided by this particular MC in this part. An important remark to the previous sentence is not a specific instance, “given to us in sensations,” but a specific type of MC, which is represented by its technical description.

To begin with, we will find what is easier to find (well, I thought so) - the requirements for the accuracy of the interface time parameters. Open the standard on RS232 and see everything you need right away. It turned out that "you can not just take it and ...", because the standard is paid and all copies on the web are illegal. Okay, we take the domestic version of the GOST for the C2 junction and do not find there any time parameters at all, except for the duration of the front and the cut of the pulse. At first, this caused a slight shock — as it could be — but then it came to be understood that junction C2 describes only the interface part of the IRPS and the requirements should lie in the latter. In principle, everything is logical, it is incomprehensible only why it is not explicitly described in GOST, but in the end, sometimes you can think for yourself, although all the same “it’s not neatly working out”.

Of course, knowing the transmission protocol, it is possible from general considerations to find the maximum allowable mismatch between the transmitter and receiver speeds (0.5 / 9.5 = 5.2%), but this will be a study of a spherical horse you know where, because:

the requirements of the standard can and should be more stringent than a similar theoretical calculation of the maximum permissible mismatch;
knowledge of the final mismatch figure will not give us the transmitter and receiver budget.

Wanderings across the Ineta expanses resulted in Atmel's AppNote (well, if we still use the MK of this company), which explicitly states that the mismatch is 2% with an equal budget, which leads to the requirement of maintaining the transmitter frequency to 1%. We will believe a respected company and suppose that they have access to secret materials and this figure is correct, especially since it looks believable. I understand the vulnerability of such a position, but I, frankly, was tired of looking for the exact answer to such a simple question, and I can't wait to move on to the next part.

The next half of the answer lies inside the MC and is determined by the technical documentation for it. First, a little about the structure of the internal generator, especially since it is more or less described. The generator uses a chain as the timing element of the RC and, since the task of forming an integral capacitor and the exact resistor in the integral performance is very nontrivial, the total frequency will vary from instance to instance of the MC. To make this option more predictable, manufacturers have added a hardware node controlled via a calibration byte. This node allows you to change the frequency of the generator in a wide range and, accordingly, to obtain the desired value with a much higher accuracy.

It would be interesting to know exactly how the control is implemented in the hardware, I see the option of either controlling the voltage of the capacitor charge through the DAC or controlling the comparison voltage on the comparator. Both of these options, however, lead to significant non-linearity of the regulation characteristics, although they are not difficult to implement. But the establishment of the internal implementation of the generator is not part of our task, we are interested in its external parameters.

So we open the documentation (you can open the file in the viewer, and I have a typographical version of the description printed by the manufacturer itself - yes, before that happened) and look for the appropriate section. We are interested in the parameters in the section "Calibrated Internal RC Oscillator", then, if necessary, follow the links. And here we (I’m not sure about you) were waiting for the first disappointment - I have been working with Atmel products for a long time (about 15 years), and have always believed that they have good documentation on the MC. According to psychiatrists, “there are no healthy people, there are no additional ones” and a close examination of the relevant section confirmed this truth, as I could have overlooked such failures in the documentation before. In my defense I can only say that:

I have never used an internal generator in the MK data, so I did not study it especially carefully;
when I started working with these MCs (much more than 10 years ago), I was young (well, definitely younger than now) and stupid and didn’t understand the need for good (understandable, comprehensive and unambiguous) documentation;
I am ready to forgive myself a lot, simply because I forgive myself a lot, and all my flaws are not fatal (the last argument is especially convincing, isn't it?).

So, having finished sprinkling ashes on my head, I will begin to state my complaints about the documentation and there can be no excuse for the manufacturer. Open the above section and begin to carefully study it, if necessary, go to the necessary pages (you still click on the links). Together we will look for the following parameters characterizing the time characteristics of the generator: nominal accuracy, the influence of the supply voltage, the influence of temperature and aging parameters - this is the minimum necessary set for estimating the accuracy parameters of any generator.

The first part of Marlezonsky ballet is nominal accuracy.

Immediately we find the necessary parameter - the table of the generator setting accuracy, in which we see two lines of “Factory Calibrated” with the specified value ± 10% and “Manual Calibrating” with the same parameter ± 2%.

A number of questions immediately arise about these data - what they mean and how measurements of this parameter are carried out. For the first line in the table itself, the temperature (ambient or MK itself is not clear, but these are whims on my part) and supply voltage, in addition, the note says (in my opinion, unnecessarily) that this measurement is taken at a specific point in space external conditions. One can guess that in this case we should use the calibration coefficient recorded at the manufacturer, although this would be better to indicate this explicitly in a note. Everything is more or less clear and is interpreted almost unambiguously (although in the context of the study of technical documentation, it would be necessary to say that everything is unclear and allows variations in interpretations, and this is unacceptable, but if we do that, then the topic of further discussion simply disappears (and about what write, it is not clear, therefore, show indulgence).

But with the second line of the case is worse - given the limits of change in temperature and supply voltage and argues that the use of some kind of magic calibration procedure can achieve significantly better than the factory, the result in the entire range. I immediately have a question - if this can be achieved everywhere (at any point of temperature and power) and the manufacturer knows how to do this, then why hasn’t she done it herself during factory calibration at a specific point of conditions? We turn to the description of the calibration byte and see that it takes 128 values and this overlaps the range from 50% to 200% of the nominal, which corresponds to 150/128 ~ 1.17% of the frequency change per unit calibration value, which should give the expected accuracy better than one%. But further we should take into account that the adjustment characteristic is clearly not linear and in the area of large calibration values we have 60% / 32 ~ 2% step (data taken from the graph, I have repeatedly expressed my attitude to this method of presenting technical parameters, but I repeat - this is unacceptable the method, although, of course, is better than nothing), which gives an accuracy of 1% and if we take into account not the monotony of the adjusting characteristic (yes, this is exactly indicated in the documentation, not drawn in the graph, but clearly indicated in the text. I categorically refuse to understand to to, and most importantly why, the company wanted to make it a law of adjustment, but it managed), which is clearly stated in the guidelines, it is necessary to consider the accuracy of 2% is quite achievable. I do not like the fact that I had to look at the graph, but this is not necessary and the tabular data are sufficient. In this part, the documentation should be considered quite understandable and consistent, the criterion of correctness lies outside our competence.

The second part of Marlezonsky ballet. - the influence of external conditions.

But then begins "thrash, waste and sodomy." Instead of tables of values, we are invited to look at pictures (in the documentation for some reason they are called graphs of typical values), and, as you know, “the main advantage of the graphical presentation of information is its visibility, and it has no other advantages.” One could even use such information and remove the boundary values from the schedule (“although it is offensive to the team”), if this schedule had not been given in the “Typical Characteristics” section. I don’t know how anyone, personally, I am deeply convinced that I’ll indicate typical (or typical, I don’t know how correctly, in one film they say “typical appearance”) values, even in the form of a graph, even in a table - it’s just nothing to indicate. They can not be guided by the design, since these parameters do not clearly understand what any deviation of values from typical values is permissible, in contrast to the minimum and maximum values, the transition of which means a malfunction of the device.

All right, we drove, we will try to extract at least some information and see that when the temperature changes from -40 to + 80 ° , the generator frequency changes by ± 4%. A similar picture with the supply voltage - only typical graphs and the resulting error in -6 + 2% from 3.3 to 5.5. Data on the aging of the generator is simply not given, which, in general, is logical, since against the background of the parameters already given, the accuracy of one percent for 5 years (a characteristic value for silicon) no one cares about.

Now we have all the data to answer our initial question - with factory calibration the generator does not meet the interface requirements for accuracy, with calibration for specific conditions of use - meets the boundary requirements, but does not meet the standard. It should also be noted that if the calibration for the supply voltage and a specific MK can be done in the manufacture of the device and hope that they do not change in time, then the temperature can only be taken into account “on the fly” and requires an external standard of time of appropriate accuracy. Since the development of devices should be guided by the rule “we believe in God, everything else requires evidence” and we did not prove the possibility of compliance, the correct answer is to guarantee the implementation of an IRPS meeting the requirements of the standard in this MC with an internal generator. We note that we made the above conclusion in the analysis of the documentation and formulated it in such a way as to emphasize that everything can turn out well on a specific instance of the MC if the stars rise successfully. That is, our conclusion contradicts the previously mentioned post, how could this happen, because everything works fine for a person - let's understand.

Now the criticism of the above post will begin. First, let's think about how we can ensure that the device is checked for compliance with the requirements of a specific interface. I can suggest the following ways:

A good way is to measure the critical parameters of the device interface and compare them with the requirements of the standard - this can be done using universal instruments (in our case, an oscilloscope and the length of the bit interval or a complete package), or using a specialized instrument that is certified to perform testing of this interface.
So-so way - to organize interaction with another device that implements the response part of the interface and is proven (meets the requirements of the standard). Of course, such a test is completely insufficient, and rather, it can be used more to confirm the malfunction of the device under test, but does at least something.
A bad way is to independently implement the reciprocal part of the interface (in the same device or otherwise) and interact with it. Since both devices are obviously not proven, the benefits of such a test are very, very doubtful. A good example of this approach is the “echo” on the serial channel, which proves nothing but the fact that the device is not broken in principle and is capable of transmitting something, and it reports little more than the transmission speed than nothing.
A terrible way is to take a device that doesn’t meet the requirements of the standard at all (and better contradicts them) and work as in the previous paragraph.

It is the latter method used in the post under consideration - a software receiver of the serial channel is implemented, which, in contradiction with the requirements of the standard, changes its frequency, adjusting to the input signal (specifically, the length of the start bit), which allows you to consistently receive a signal of poor quality in the sense of time parameters. It cannot be said that it should never be done this way; moreover, the analog modem adopted the setting for the incoming speed, which was implemented in the same way, but it was exactly the frequency switching by changing the divider, and obviously not our case. And it is in this version that everything turns out perfectly and information is transmitted steadily under any external conditions. Therefore, if we talk about the possibility of transferring information between two MCs working from internal generators, using an interface that remotely resembles the IRPS, the answer is positive. If we are talking about interaction with external devices that meet the requirements of the standard and nothing more, then we will expect many unpleasant surprises.

The general conclusion from the above:

When designing devices, you should focus on documentation (RTFM),
it is necessary to study the documentation and interpret the read correctly (RTFMF),
keep in mind that there may be reticence, inaccuracies (and even errors) in documentation in our time, therefore
verify the information obtained for consistency and plausibility, and
use the experimentally obtained information only to confirm the findings obtained from the analysis of the documentation, while
especially carefully choose the methods of experiments on testing equipment for obtaining a reliable result.

Well, in conclusion, as promised, a little assembler. I allowed myself to rewrite the code snippet given by the author in a normal way, since the assembler built into GCC is nothing but a mockery of a programmer. No, I, of course, understand that the developers of the compiler were guided by weighty considerations, but the result painfully resembles the phrase “well, it works.”

.equ delay=15 TX_Byte: cli ; ld r18,Z+ ; cp r18,r1 ; breq Exit_Transmit ; dec r1 cbi port, TX_line Delay_TX: ldi r16,delay Do_Delay_TX: nop dec r16 brne Do_Delay_TX TX_Bit: sbrc r18,0 sbi port,TX_line sbrs r18,0 cbi port,TX_line lsr r18 lsr r17 brcs Delay_TX sbi port, TX_line ldi r16,delay Stop_Bit_TX: nop dec r16 brne Stop_Bit_TX Sei

And an error in the program immediately catches the eye - in line 3 (commented out) the value of register 1 should be zero, but the assignment is not explicitly stated in the function. After completing the transmission cycle of one byte, this value is guaranteed by line 12, but not on the first pass. Therefore, initialization must be added, which will require an increase in code size.

The second drawback is the formation of the level in rows 4–7, since the method adopted by the author for issuing the next bit will lead to front jittering for 2 clocks at various transitions (0–1 and 1–0), which will increase the requirements for accuracy of frequency hold. It’s not that it gives a very strong influence, but if you can correct the flaw without extending the program, then why not - see the epigraph. The original version took 4 words and was executed in 4 bars, the new one takes 4 words and is executed in the same 4 bars. Yes, the corrected version requires a deeper study of the architecture of the MC, but who said that it would be easy. On the other hand, in the first embodiment, the port modification is atomic, and in the second, it is not, in this case it does not matter (we have explicitly prohibited interruptions), but the sediment remains. If in the considered MK there was a real bit processor, as in architecture 51, then we could write an ideal fragment combining all the advantages of both approaches (and even would be a bit shorter), but what to dream about unrealizable ...

The third drawback is the issue of more style. I have repeatedly expressed my attitude towards the magic constants that we see in the preamble of this program. I emphasize once again - because the author sets a constant in the preamble of the program, and not directly in the operator, “ordinary street magic” does not disappear anywhere. The fact is that we must explicitly show the reader a method of forming a specific value, and not create a synonym for the value obtained in an unknown way. You can, of course, write a comment to the line with the value in which you specify the calculation formula, but it is better to use the calculation formula explicitly when forming the constant and then the comment is simply not needed (of course, with the speaking names of the constants used). This is done in the text below, and note that we only convert to an integer at the last moment and round it up correctly, which allows us not to lose the accuracy of the result.

There is one more error - the duration of the start bit is somewhat different from the bit interval for data. Although the deviation is not too significant (3 clocks), nevertheless, at high transmission speeds, where the length of the bit interval leaves about 90 clocks, this is already a few percent error, which is unacceptable. This error can be easily corrected by adding additional delay commands, but this will increase the length of the program, so for now just fix its presence and then ensure that the correct architecture of the program (this way, even this short program applies this concept) eliminates automatically.

Well, now that we have corrected errors (except the last), we will try to improve the program in the sense of the main criterion (to achieve a record, in this particular case) - the length of the code. The first thing that catches your eye is the presence of two time exposures, which is bad because it violates the principle of DRY (general requirement) and increases the size of the code (specific requirement). It would be possible to arrange this fragment in the form of a subroutine and we would still benefit in length, because we add 3 code words (1 for calling in two places and 1 for return), and save 4, but there is a much more beautiful way - neat the organization of the byte transfer cycle, which can be seen in the following text.

 .equ delay=15 TX_Byte: cli sec ;   - clt ;  - TransBit: ;    in r17,port bld r17,Tx_line out port,r17 Delay_TX: ;     ldi r17,delay Do_Delay_TX: nop dec r17 brne Do_Delay_TX TX_Bit: bst r16,0 ror r16 clc brne TransBit ;    brcs TransBit ;  - Exit_Transmit: Sei

Let us note how we use the transmitted byte together with the transfer bit as a bit counter, a beautiful solution, but it has one drawback - the duration of the last bit of data will be several (2 clocks) longer than the others, due to the transition delay. If we were talking about the stop bit, then “don't give a damn and forget,” since we have not set the minimum interval between transfers, but this is a significant bit, and we have just criticized the original program for such behavior. We will not be like the biblical character from the parable of the mote in someone else's eye and take steps to eliminate it. This phenomenon could easily be compensated by introducing a delay of 2 cycles, but the code length will increase, and this is a key parameter. Therefore, let's go the classic way and change the time for memory - use a separate register to organize the counter of transmitted bits, and we get exactly the same bit intervals for the same code size.

The next improvement is related to the formation of the bit interval duration, which in the source program is performed on a 4 clock cycle. If we make it 3-clock (minimally possible in this MK), then we can save one byte of code and potentially we can improve the accuracy parameters, since the discreteness of the delay will be less (the deviation does not exceed half the size of the discrete with proper rounding). But it should be borne in mind that in a particular case we can lose accuracy, it all depends on the source data. Another circumstance that could affect the choice of just such a cycle duration - the maximum delay size with a byte counter is 256 values - for the available option you can use speeds from 9600 baud and above, but with 3 cycle delay it is impossible. It would be very nice to reflect this circumstance (minimum port speed) in the comments to the program and at the same time display a warning message in case of violation of this requirement. Well, make the appropriate modifications to the parameter formation macros to form the delay, not forgetting to use “speaking” names to designate variables.

 .equ Freq = 8000000 .equ BaudRate = 115200 .equ PayLoad = 9 ;     .equ CycleTime = 3 ;    .equ delay=((Freq*2/BaudRate - PayLoad*2)+CycleTime)/(CycleTime*2) TX_Byte: cli ldi r18,10 sec ;   - clt ;  - TransBit: in r17,port bld r17,Tx_line out port,r17 Delay_TX: ldi r17,delay Do_Delay_TX: dec r17 brne Do_Delay_TX TX_Bit: bst r16,0 ror r16 dec r18 brne TransBit Exit_Transmit: sei

Now let's look at the result - the code size decreased from 20 to 16 words (if only the transmission itself is taken into account, then even more strikingly - from 18 to 14, the front jitter disappeared (of course, only that component of the jitter, which is due to the program features, we have We do not encroach), the accuracy of keeping time intervals has improved, the program has become clearer and easier to understand (due to comments, since even a well-written program in assembly language is not self-documented, as a rule, is not).

Conclusion from the last part - if we are going to set world records in programming in assembler, then we should study the architecture of a particular MC very deeply and apply the knowledge gained to get the perfect result, paying attention to all the details.

And finally, the task of writing a minimum size code nowadays looks a bit contrived, but, quite unexpectedly, it receives confirmation of its vitality. At the end of last year (2016, that's how long this post was waiting for its turn) a new MK from the MSP430 family was announced, which along with a uniquely low price (26 cents - we are waiting for the appearance of Chinese devices based on it) has a uniquely small amount of program memory - 512 byte (no, I was not mistaken, the letter "k" immediately after the number does not). So the code size can be critical when using this device, and indeed writing such extreme programs requires an in-depth study of the MC, and “work in itself is a blessing”.

Source: https://habr.com/ru/post/412959/

All Articles

On the issue of AVR and world records

Do well, bad yourself

More articles: