FB II Compiler

PG PRO

Debugging

Memory

System

Mathematics

Resources

Disk I/O

Windows

Controls

Menus

Mouse

Keyboard

Text

Fonts

Drawing

Sound

Clipboard

Printing

Communication

ASM

Made with FB

FB II COMPILER

Measure float point and integer speed in FB^3


As an antidote to list traffic about teething problems in FB^3, consider the simple benchmark below. It is modified from "Examples:Neat Apps:FB II vs FB^3 Example" on the FB^3 CD. The results give cause for celebration (champagne and cigars for Staz'n'Andy), as well as food for thought.
'--------Simple floating point benchMark-------
register off
DIM &&,x#,y#,z#,t#
register on
dim i&,t&
x# = 12345678.90123456789
y# = 123: z# = .01: t# = 100
t& = FN TICKCOUNT
FOR i&=1 TO 10000000: x#=x#+y#*z#-y#/t#: NEXT
PRINT FN TICKCOUNT-t&;" ticks, x#=";x#
'-----------------------------------------------
Results from iMac (233 MHz G3):

Time to nearest 10 ticks MFLOPS
FB211870 0.2
FB^3 68K (1)39200.6
FB^3 PPC (2)38700.6
PPC ASM (2)7103.4
FB^3 PPC (3)23010.4
FB^3 PPC (4) 23010.4
PPC ASM (3)10024
PPC ASM (4) 10024

1. Unaffected by alignment of variables
2. Variables aligned on 2-byte boundary but not 4
3. Variables aligned on 4-byte boundary but not 8
4. Variables aligned on 8-byte boundary

Explanation to Note (2)
Misalignment can be forced by the following DIM statement:
DIM &&,silly%,x#,y#,z#,t#

Explanation to Notes (3) and (4)
DIM &&,x#,y#,z#,t# sometimes fails to produce 8-byte alignment, through an anomaly reported to Staz. On some processors, though not the G3, this could slow the performance.

Assembly stuff, for bold explorers:
 '-------------PPC ASM equivalent------
//FOR i&=1 TO 10000000: x#=x#+y#*z#-y#/t#: NEXT
countFP&=10000000
` lwz r3,^countFP&    ; r3=10000000
` lfd f1,^x#
` lfd f2,^y#
` lfd f3,^z#
` lfd f4,^y#
` lfd f5,^t#
`loopFP
` fdiv f0,f4,f5       ; f0=f4/f5
` addic. r3,r3,$FFFF  ; r3=r3-1 (subic. r3,r3,1)
` fmadd f1,f2,f3,f1   ; f1=f2*f3+f1
` fsub f1,f1,f0       ; f1=f1-f0
` stfd f1,^x#         ; save f1->x#
` bc 4,2,loopFP       ; bne loopFP
'--------------------------------------
Corresponding to the floating point benchmarks posted recently, here are integer benchmark results (iMac 266MHz G3), using a modification of the Sieve of Eratosthenes program in "Examples:Neat Apps:FB II vs FB^3 Example" on the FB^3 CD.

Even in 68K, FB^3 is measurably faster than FB2. PPC native code is 2-3 times faster again. Tuning (by careful application of REGISTER ON) gives a moderate but worthwile improvement. Lastly, there is scope for the speed-hungry assembly programmer. Further congratulations to Staz'n'Andy seem in order.

The overall pattern is similar to that of the floating point benchmark. In that benchmark, however, REGISTER had no effect and tuning was merely a matter of avoiding disastrous variable misaligment.

Time in ticks (1/60s) Notes
FB2355
FB^3 68K3221 All RAM variables
FB^3 68K3112 Some REGISTER variables
FB^3 68K2763 All REGISTER variables
FB^3 PPC 1251 All RAM variables
FB^3 PPC 1162 Some REGISTER variables
FB^3 PPC933 All REGISTER variables
PPC ASM 29

Note (1). REGISTER ON changed to REGISTER OFF

Note (2). In the original program, two variables (i and k) were unskilfully DIMmed in such a way that they cannot be REGISTER:-

DIM &&,i,k///Align Vars to even address <-- misguided
DIM f(8191)
DIM t&,loops,c,p

In fact no alignment directive is needed at all; the compiler always uses an even address. Finally, the && alignment directive (supposedly 8-byte) is larger than needed for integer (2-byte) variables.

Note (3). As in listing below.
'-----------Integer BenchMark--------------
LOCAL FN doSieveFB
  REGISTER ON ' with compiler preferences register variables ON, too
  DIM i, k, loops, c, p, f(8191), t&
  t& = FN TICKCOUNT
  FOR loops = 1 TO 1000
    c = 0
    FOR i = 0 TO 8191
      f(i) = 1
    NEXT i
    FOR i = 0 TO 8191
      LONG IF f(i) <> 0
        p=i+i+3
        LONG IF i+p <= 8191
          FOR k = i+p TO 8191 STEP p
            f(k)=0
          NEXT k
        END IF
        c = c+1
      END IF
    NEXT i
  NEXT loops
  t&=FN TICKCOUNT-t&
  PRINT c;" primes "; t&;" ticks"
END FN
'-------------------------------------------

'-----------Assembly equivalent-------------
#IF cpuPPC
  LOCAL FN doSieveAssembler
    REGISTER OFF ' disable because we need addresses of variables
    DIM &, t&, fPtr&, i, k, loops, c, p, f(8191)
    REGISTER ON
    t& = FN TICKCOUNT
    fPtr&=@f(0)
    FOR loops = 1 TO 1000
      ` addi r6,0,0; c = 0
      ` lwz r9,^fPtr&      ; address of f(0)
      ` addi r4,0,2        ; 2
      ` subf r11,r4,r9     ; address-2 for sthux
      ` addi r5,0,8192     ; loop count
      ` mtspr ctr,r5       ; loop count
      ` addi r3,0,1        ; r3 = 1
      `iClearLoop
      ` sthux r3,r11,r4    ; f(k)=1
      ` bc 16,0,iClearLoop ; bdnz iClearLoop

      ` addi r10,0,0       ; i=0
      `iLoop
      ` add r4,r10,r10     ; i*2
      ` lhzx r3,r9,r4      ; f(i)
      ` cmpi cr0,0,r3,0    ; cmpwi r3,0
      ` bc 4,1,skip        ; ble skip
      ` addi r4,r4,3       ; p = i*2 + 3
      ` add r5,r4,r10      ; k = i+p
      ` cmpi cr0,0,r5,8192 ; cmpwi r5,8192
      ` bc 4,0,incrementC  ; bge incrementC
      ` add r8,r5,r5       ; k*2 index into INT array
      ` add r7,r4,r4       ; p*2
      ` addi r3,0,0        ; r3=0
      ` add r11,r9,r8
      ` subf r11,r7,r11    ; adjust index for sthux
      `kLoop               ; the inner loop
      ` add r5,r5,r4
      ` cmpi cr0,0,r5,8191 ; cmpwi r5,8191
      ` sthux r3,r11,r7    ; f(k)=0
      ` bc 4,1,kLoop       ; ble kLoop

      `incrementC
      ` addi r6,r6,1       ; c=c+1
      `skip
      ` addi r10,r10,1     ; i=i+1
      ` cmpi cr0,0,r10,8191; cmpwi r10,8191
      ` bc 4,1,iLoop       ; ble iLoop

      ` sth r6,^c          ; store c
    NEXT loops
    t&=FN TICKCOUNT-t&
    PRINT c;" primes "; t&;" ticks"
  END FN
#ENDIF
'----------------------------------------------
Robert Purves