Binary calculations for decimal arithmetic

Continuing to investigate the problem of the accuracy of decimal calculations using binary arithmetic, begun in previous posts [1,2,3,4], I was able to develop algorithms for calculating real numbers presented in the format of decimal floating-point numbers that give the same exact result as if calculations would be done manually.

These algorithms use binary arithmetic, regulated by the IEEE754 standard. To test the algorithms, a test program in C ++ was developed, which implements an 18-bit decimal calculator.
Since the volume of material exceeds the format of the post, I outlined the main points in the form of theses. We call this post "May theses": (.

So.

It is known that


The usual arithmetic for the user is decimal arithmetic.

There are also b-ary arithmetic, where b is a base of a number system that takes any non-zero value [5].

For displaying numbers at different scales, a floating-point entry is used as a product of a mantissa and some arbitrary base degree. This is the so-called exponential notation.

If the power of a number is fixed and the mantissa of a number is an integer, then this format is called a fixed-point format. A special case of a fixed-point format is a number in which the degree is zero. This format is an integer format.

If the mantissa is a fractional number in the b-ary number system with the integer part c ≠ 0 and c <b, then this number is called normalized.

Despite the fact that, by their physical nature, the numbers are approximate, for a computing device these are exact numbers and the device must operate on them with a user-defined accuracy.

By exact calculations in arithmetic imply obtaining a result with a given number of valid significant digits after the point [6].

All calculations in the computer are made in binary form. For them, the base b = 2.

Since the binary and decimal number systems are incommensurable, then when converting decimal real numbers into binary code, most often we get an approximate value of the binary equivalent of a decimal number. Therefore, when converting decimal numbers to binary, representation errors occur.

Decimal numbers that have the exact binary equivalent are called representable.
Decimal numbers that do not have the exact binary equivalent are called non-representable.

All integer decimal numbers are representable if the number of significant digits in their binary equivalent does not exceed the bit grid of the area of ​​the machine word in which they are written.

The greater the number of binary digits represented by the decimal equivalent of a number in binary form, the smaller the error in representation. This explains the desire to constantly increase the bit of the operational register of the processor.

Any decimal number, the binary equivalent of which contains the number of significant digits exceeding the digit grid of a machine word, can only be represented approximately.

Arithmetic operations, which result in a result that exceeds the digit capacity of the machine word mantissa, return an approximate number.

Approximate numbers may contain correct, questionable and incorrect numbers.
Incorrect figures in the calculations affect the accuracy and sometimes can lead to completely wrong results [3].

In accordance with the theory of approximate calculations, to obtain correct results, approximate numbers are rounded in such a way as to exclude incorrect numbers [6].

The accuracy that the user wants, or can get in the calculations, is determined by the number of correct digits that the computational algorithm provides.

Any binary number can be rounded to a specified number of binary digits, discarding extra digits.

Similarly, any decimal number can be rounded to the required number of decimal digits, discarding extra digits.

It is impossible simply to discard extra binary digits in a binary number to round its decimal equivalent to a predetermined number of decimal digits, since decreasing the bit depth of the binary equivalent of a decimal number leads to an increase in the number of incorrect digits in its decimal equivalent.

Any real number, expressed in decimal form, can be accurately represented in the format of a floating-point number (NPT), in which the mantissa is an integer. The exponent in the PTP will indicate the position of the point in this number.

If the number is represented in the format of a PCT with an integer mantissa, then the mantissa and the exponent of this number can be exactly converted to binary code.

New


The format in which the decimal NPT mantissa is represented by the binary equivalent of the decimal integer mantissa, and the exponent is the binary equivalent of the power of 10 (base b = 10) will be called the mixed decimal-binary format (SDDF).

SDDF differs from binary NPT format in that the exponent in SDFF is equal to the base degree b = 10, while the binary NPT exponent is equal to the binary base degree b = 2. Accordingly, for SDF the number will be represented as F=M210e, and for PCT, in the IEEE754 standard, as F=M22e.

SDPF differs from binary decimal format (DDF or BCD) NTP in the fact that in DDF the mantissa and exponent are whole decimal numbers in which each digit is written as a byte or tetrad, while in SDDF all decimal numbers are expressed their integer binary equivalents.

Thus, any decimal real number can be represented in SDF by a binary code with an accuracy of N significant decimal digits.

All arithmetic operations on decimal NTPs in SDDF are carried out according to the rules of ordinary arithmetic, where all arguments are integers.

Before each arithmetic operation a decimal number is presented in the format SDF: [S, M, z, e]. Where S is the sign code of the number (0 or 1). M is the binary integer equivalent of the mantissa of the number with the number of decimal digits N. Where N is the accuracy of the calculations. z is the sign of the exponent, e is the value of the exponent. Such a representation is a normalized representation of a decimal number.

For example, for calculations with an accuracy of N = 7 significant digits, the number 123,456 should be represented as 1234560 * 10 ^ -4.

The minimum value of the decimal mantissa of the number for N = 7 will be equal to M = 1,000,000.

The maximum value of the decimal mantissa of the number for N = 7 will be equal to M = 9999999.

The binary equivalent of the maximum 7-digit number 9999999 will be 100,110,001,001,011,001,111,111. It contains 24 binary digits. Therefore, a binary 24-bit register is required to represent decimal mantis in the range from 1,000,000 to 9,999,999.

If in a 32-bit binary machine word, in which 24 bits are taken under the mantissa, one bit is placed under the sign of the number S, one bit is under the sign of the exponent z, and also 6 bits is under the exponent, then such decimal numbers can be represented in such SDDF Accurate to N = 7 significant decimal numbers. The absolute values ​​of these numbers lie in the range from 1000000 * 10 ^ -64 to 9999999 * 10 ^ 64.

After each arithmetic operation, the decimal mantissa of the number should be normalized and rounded to the nearest integer. To do this, the resulting binary equivalent of the mantissa of the number, if necessary, must be multiplied or divided by the binary equivalent of 10 to such an extent that the number of digits of the decimal equivalent of the mantissa becomes equal to N. The resulting number must be rounded to the nearest integer.

Example.

Find up to N = 7 the result of the expression (9675.423 * 10 ^ 2-9.675421 * 10 ^ 5) * 10 ^ 6 - 199992
Calculated manually, or on a Windows calculator, this expression will be equal to the number of 8,000000
We write the operands in normalized form:

A=9,675423*10^5= 9675423*10^-1
B= 9675,421*10^2 = 9675421*10^-1
C=1000000 = 1000000*10^0
D = 1999920*10^-1


In SDDF, these operands will be represented as:

A=[0, 9675423,1, 1]
B=[0,9675421,1, 1]
C=[0, 1000000,0, 0]
D=[0, 1999920,1, 1]


Find the difference S = AB. Since the exponents of the operands A and B are the same, we find the difference between their mantissa:

9675423-9675421=2

To normalize the mantissa S, it is necessary to multiply it by 10 ^ 6, at the same time, the exponent must be reduced by 6. Then S = 2 * 1000000 = 2000000 * 10 ^ -7

Calculate the product P = D * C. To do this, multiply the mantissa factors and add the exponents:

M = 2,000,000 * 10 ^ -7 * 1,000,000 * 10 * 0 = 200,000,000,000,000 * 10 ^ -7
After the mantissa is normalized, we get P = 2000000 * 10 ^ -1.
The result of the R calculation will be equal to:
R = PD = 2000000 * 10 ^ -1-1999920 * 10 ^ -1 = 80 * 10 ^ -1
After normalization, we get R = 8000000 * 10 ^ -6.

For comparison, the calculation of this expression in Excel gives the result R = 8,0000698E + 00.

The author has developed a calculator algorithm in SDF, which performs the addition, subtraction, multiplication and division of decimal numbers with an accuracy of 18 significant digits. To confirm the correctness of the algorithm, a C ++ program was written. Since the author is not a professional programmer, the developed program is intended only for the study of the computing algorithm.

Below, for example, is a screenshot showing the calculation of the following expression:

1,23456789098765432*10^8 * 9,87654321234567891*10^(-9) - 1,2193263123914037*10^0≈ 3.0*10^(-17)



To check the speed, a cycle was used to multiply two 18-bit numbers. The program was launched on the computer Intel® Core (TM) i3-4330 CPU@3.50GHz 3.50 GHz. RAM 8.0 GB. System type: 64-bit. The rate turned out to be ≈ 2.4 * 10 ^ 6 multiplications per second.

I cannot yet compare it with the speed of Windows and Excel calculators; there is not enough education :(. As for the accuracy of calculations, it is the same as if calculations were performed manually.

Literature:

  1. Side View: IEEE754 Standard
  2. Is floating-point normalization needed?
  3. Fatal binary arithmetic errors when working with floating point numbers
  4. Again about floating point numbers
  5. Number systems
  6. Basic rules for approximate calculations

Source: https://habr.com/ru/post/412681/


All Articles