This version unifies double and extended precision and includes crude debugging hooks. This is the double precision version shorn of the switches and hooks and this is the extended precision version.

These two pass the crude provided tests while the `csqrtl` supplied by Apple produces only about 16 places of precision.
Today is 2007, Apr 15.
PS: As of 2007 July the Apple csqrtl is good.

This works with Apple’s gcc 4.0.1 using the -std=c99 option. I have not tested it with earlier versions.

The test of the sign of A is to avoid a region of the complex plane, near the negative real axis, where cancelation ruins accuracy. The effect may be seen in this modification which omits the test and yields:

0.00000000000000000e+00 2.00000000000000000e+00 2i 2.00000000000000000e+00 3.00000000000000000e+00 x 1.67414922803554012e+00 8.95977476129838135e-01 r 0.00000000000000000e+00 0.00000000000000000e+00 r*r-x -1.00000000000000000e+00 -1.00000000000000002e-03 x 4.99999937529384307e-04 -1.00000012494124713e+00 r 1.17427734203090495e-10 -2.16840434497100887e-19 r*r-x -1.00000000000000000e+00 0.00000000000000000e+00 x 0.00000000000000000e+00 nan r nan nan r*r-xwhich gets only about 10 digits of accuracy whereas with the test we get nearly 16 digits:

0.00000000000000000e+00 2.00000000000000000e+00 2i 2.00000000000000000e+00 3.00000000000000000e+00 x 1.67414922803554012e+00 8.95977476129838135e-01 r 0.00000000000000000e+00 0.00000000000000000e+00 r*r-x -1.00000000000000000e+00 -1.00000000000000002e-03 x 4.99999937500027257e-04 -1.00000012499996105e+00 r -2.22044604925031308e-16 0.00000000000000000e+00 r*r-x -1.00000000000000000e+00 0.00000000000000000e+00 x 0.00000000000000000e+00 1.00000000000000000e+00 r 0.00000000000000000e+00 0.00000000000000000e+00 r*r-xThe tests for very large and very small argument avoid exponent overflow and underflow in intermediate results.