FB II Compiler

PG PRO

Debugging

Memory

System

Mathematics

Resources

Disk I/O

Windows

Controls

Menus

Mouse

Keyboard

Text

Fonts

Drawing

Sound

Clipboard

Printing

Communication

ASM

Made with FB

MATHEMATICS

Use a high speed square root function


On 4 January 1999 I posted a fast integer square root method (FN SquareRoot&). A small bug has become apparent: some values returned are wrong in the last digit. The error is less than 1 part in 40000, but integer arithmetic should be exact and as a penance I offer FN BetterSquareRoot&. This has been explicitly tested for accuracy over the entire range of its argument (0-4294967295).

There was also a mistake in the timing I showed for the FB floating point square root (SQR). The new timings are shown below. FN BetterSquareRoot& is an easy winner.

Timing in ticks (on iMac) for roots of numbers 0 to N

N 100000 10000000
BetterSquareRoot& 10 996
SquareRoot& 13 1512
USR _sqRoot 17 2120
SQR 2582 (way too slow to test)

Robert

'----------------------------------------------------------------------
LOCAL FN BetterSquareRoot&(a&)
' Returns the largest integer r& such that r&*r& <= a&
' Replacement for USR _sqRoot(a&) which:
' 1. crashes for a&=_Maxlong/2
' 2. gives wrong answers for a&>_Maxlong/2
' 3. gives "high-by-one" answers in many cases, e.g. USR _sqRoot(15)=4
' This routine works for all values of a& unsigned (0-4294967295)
' and is about twice as fast as USR _sqRoot. It DOES NOT WORK on 68000 cpus.

  ` move.l d0,d3
  ` beq.s SqDoneN ; skip if 0
  ` move.l d3,d6
  ` move.l d3,d1

  ` cmpi.l #4095,d3
  ` bhi.s AccApprox ; bigger values need better approx

  `RoughApproxLoop ; fast halve number of bits in a&
  ` lsr.l #1,d3
  ` lsr.l #2,d1
  ` bne.s RoughApproxLoop
  ` addq.l #1,d3 ; force non-zero
  ` bra.s GotStartVal

  `AccApprox
  ` move.w #32,d7
  ' BFFFO D1,{$00:$00},D0
  ` dc.w &EDC1,&0000
  ` sub.w d0,d7 ; number of bits in a&

  ` lsr.w #1,d7 ; numBits/2
  ` bcs.s Nodd ; branch if numBits is odd

  ` lsr.l d7,d3 ; shift out half the bits
  ` subq.w #2,d7
  ` btst d7,d3
  ` bne.s GotStartVal ; branch if 2nd msb is 1
  ` subq.w #1,d7
  ` bset d7,d3 ; set the 3rd msb
  ` subq.w #1,d7
  ` bset d7,d3 ; set the 4rd msb
  ` bra.s GotStartVal

  `Nodd
  ` lsr.l d7,d3 ; shift out half the bits
  ` subq.w #1,d7
  ` bclr d7,d3 ; test and clear 2nd msb
  ` beq.s GotStartVal ; branch if 2nd msb was 0
  ` subq.w #1,d7
  ` bset d7,d3 ; set the 3rd msb

  `GotStartVal
  ` move.l d3,d0

' iterate root&=(a&/root&+root&)/2 twice
  ` clr.l d1 ; hi of a&=0
  ` move.l d6,d2 ; lo of a&
  ' DIVU.L d2.l= d1(hi) d2(lo)/d0.l; d1.l=remainder
  ` dc.w $4C40,$2401
  ` add.l d2,d0 ; a&/root&+root&
  ` lsr.l #1,d0 ; /2

  ` clr.l d1
  ` move.l d6,d2
  ` dc.w $4C40,$2401
  ` add.l d2,d0
  ` lsr.l #1,d0

  `TestForTooBig
  ` move.l d0,d2
  ` mulu d2,d2 ; root&*root&
  ` cmp.l d6,d2 ; compare with a&
  ` bls.s SqDoneN ; unsigned comparison
  ` subq.l #1,d0 ; adjust to prevent "hi-by-one" error
  ` bra.s TestForTooBig ; repeat until OK
  `SqDoneN
END FN