C - How many decimal places does the primitive float and double support Stack Overflow

How many decimal places does the primitive float and double support? [duplicate]

I have read that DOUBLE has 15 digits and FLOAT has seven digits. The question is, is this number of decimal numbers or the total number of numbers?

Question date and time: January 20, 2015 12:48 PM 21, 9k 69 69 Gold Badge 195 195 Silver Badge 319 319 Bronze Badge

It would be more accurate to say that the number of decimal points required to accurately express all floating point values ​​([0, 1]). By saying that the floating point expression "supports" or "memory", it is easy to think that floating point can represent all decimal points up to that digit, but in fact, up to that digit. It cannot be represented by all the decimal points of accuracy. Naturally, the trivial matter of not being able to accurately express the number of accuracy at a floating point is converted.

Answer at 1:02 pm on January 20, 2015

4 Answers 4

If you use an architecture that uses IEEE-754 floating point operations (most architectures), Float is written in the standard and Double is doubled.

Let's make it a single digit:

Single precision:

A 3 2-bit is used to display the numerical values, of which 24 bits are mature. In other words, the lowest bit (LSB) has a relative value of 2^(-24) to the MSB, which is "hidden 1" and is not expressed. Therefore, if the index is constant, the minimum representation is 10^(-7, 22) of the index. In other words, in the basic index notation (3. 141592653589 E 25), only decimal "7, 22" is significant, and at least seven decimal numbers are always correct.

Double precision:

To express the number, 64 bits are required, of which 53 bits are assigned to a maximum. With the same the same logic, 2^(-53) is 10^(-15, 95), which is 10^(-15, 95), and at least 15 margins are always correct.

1 1 1 Silver badge Reply to 14:10 on January 20, 2015. Samuel Navaro Lou Samuel Navarro Lou Lou 1 168 6 6 Silver Badge 17 17 Bronze Badge LSB has a value of 2^(-23) for MSB, not 2^(-24). Comments on January 20, 2015 14:38

In fact, the "real" MSB is covered and not represented in the mantissa, because (because of floating point) the MSB is always equal to 1. As a result, the MSB in the mantissa contains 2^(-1) for the exponent and LSB-2^(-24). This view is fundamental to real decimal arithmetic. Of course, it is true if you consider the conditional meaning for the MSB shown in the mantissa.

Comment on January 21, 2015 at 20:44

This is, so to speak, a continuous number of "significant digits", believing in the autonomy of 1 from left to right where there is a busta of the decimal point. Beyond this number of digits, no precision is stored.

The numbers you said are presented for 10.

Reply 2015-01-20 12:50 John Zvink 247k 42 42 GOLD ICON 336 336 SILVER ICSEN 450 450 BRONZE ICON

There is a macro for the number of decimal characters supported by each type. The GCC documentation explains what it is and what it means:

Flt_dig

This is the number of decimal digits of precision for floating-point data similarity. On a technical level, if P and B are the precision and base (respectively) of the representation, the decimal precision Q is the maximum number of decimal points that any quantity of floating-point with Q because of B and could be rounded back to a quantity of floating-point with P because of P-number of decimal points, without Q consisting of decimal points.

The values ​​of these macros must not be at least 6 to conform to the ISO C standard.

dbl_dig ldbl_dig

Like flt_dig, but for double and long double data types. The meaning of these macros must not be at least 10.

In GCC 4. 2 and Clang 3. 5. 0, these macros 6 and 15 are differentiated accordingly.

Reply Jan 20, 2015 12:52 298K 29 29 29 Gold Icon 674 674 SILVER ICSEN 1K 1K Bronze Icon

Are these quantities less than the number of commas or consecutive digits after the supported number of characters?

These are the number of digits that are meaningful in each quantity (not all digits need all chances, but they are all the same). The first and the same mantissa of such times have a monotonic number of bits, which means that the number of "numbers" allowed is uniform, in decimal terms. You cannot protect more digits than will fit in the mantissa.

For example, Float usually supports up to 38 digits below the decimal point, and Double supports up to 308 digits, but most of them are not important (that is, "unknown").

This is incorrect at the technical level, but, for example, Float and Double do not have a universal and clear specified volume, as I wanted above (they are implemented oriented). In addition, the size of the repository does not always match the intermediate volume.

The C ++ standard is reluctant to give a clear definition of such basic things, and almost all are entrusted to the implementation discretion. Floating comma type is not considered an exception:

3. 9. 9. 1/8 There are three similar to floating points: Float, Double, and Long Double. Double images guarantee accuracy as float, and Long Double images guarantee the accuracy as Double. A set of values ​​such as Float is deemed to be a part of a set of values ​​such as Double, and a set of values ​​such as Double is considered a set of values ​​such as Long Double. The expression of the floating point type value is implemented oriented.

Of course, this is not so great.

In fact, the floating point is compatible with (almost) IEEE 754 standard, the floating point width is 32 bits and a multiple (multiple (on memory, some major architectures are registered in the memory). High accuracy).

According to this, it is equivalent to 24 bits and 53 Match bites, and the absolute sign of decimal is 7 and 15. < SPAN> For example, Float usually supports up to 38 digits below the decimal point, and DOUBLE supports up to 308 digits, but most of them are not important (that is, "unknown").

avatar-logo

Elim Rim - Journalist, creative writer

Last modified 19.03.2025

The distance between two adjacent representable floating-point numbers is not constant, but is smaller for smaller values and larger for larger values. In other. After examining the variables, it seems as though is not represented in double precision with full accuracy. Is there any way to avoid this? In addition to hexadecimal and decimal literals, TypeScript also supports binary and octal literals introduced in ECMAScript ts. let decimal: number = 6.

Play for real with EXCLUSIVE BONUSES
Play
enaccepted