FB II Compiler

PG PRO

Debugging

Memory

System

Mathematics

Resources

Disk I/O

Windows

Controls

Menus

Mouse

Keyboard

Text

Fonts

Drawing

Sound

Clipboard

Printing

Communication

ASM

Made with FB

SYSTEM

Convert numbers from PC to Mac


I need to parse some image files from the dark side where words (2 bytes) and longs (4 bytes) are the reverse of their Mac counterparts. What I've come up with is below. Is there a better or faster method?
CLEAR LOCAL
LOCAL FN intelWordToMacWord%(anIntelWord%)
  DIM aMacWord%
  DIM byte0%, byte1%

  byte0% = PEEK(@anIntelWord% + 0)
  byte1% = PEEK(@anIntelWord% + 1)

  POKE      (@aMacWord% + 0), byte1%
  POKE      (@aMacWord% + 1), byte0%

END FN = aMacWord%

CLEAR LOCAL
LOCAL FN intelLongToMacLong&(anIntelLong&)
  DIM aMacLong&
  DIM byte0%, byte1%, byte2%, byte3%

  byte0% = PEEK(@anIntelLong& + 0)
  byte1% = PEEK(@anIntelLong& + 1)
  byte2% = PEEK(@anIntelLong& + 2)
  byte3% = PEEK(@anIntelLong& + 3)

  POKE      (@aMacLong& + 0), byte3%
  POKE      (@aMacLong& + 1), byte2%
  POKE      (@aMacLong& + 2), byte1%
  POKE      (@aMacLong& + 3), byte0%

END FN = aMacLong&
Michael Evans

Well, for a start you should remove CLEAR LOCAL, which wastes time by setting your local variables to zero on every call. You don't need that.

Then you can gain a little more speed by removing the byte% variables altogether.
LOCAL FN NintelWordToMacWord%(anIntelWord%)
 DIM aMacWord%
 POKE @aMacWord%, PEEK(@anIntelWord%+1)
 POKE @aMacWord%+1,PEEK(@anIntelWord%)
END FN = aMacWord%

LOCAL FN NintelLongToMacLong&(anIntelLong&)
 DIM aMacLong&
 POKE @aMacLong&,   PEEK(@anIntelLong&+3)
 POKE @aMacLong&+1, PEEK(@anIntelLong&+2)
 POKE @aMacLong&+2, PEEK(@anIntelLong&+1)
 POKE @aMacLong&+3, PEEK(@anIntelLong&)
END FN = aMacLong&
In FB2, these new versions take about 2/3 the time of your functions; a worthwhile but not impressive speed-up. In FB^3 the speed-up is less.

You may like to consider making them "in-line" functions, by putting the code into your loop.
Assuming you have something like:
FOR j& = 1 TO gazillion&
 macNum&(j&)=FN intelLongToMacLong&(intelNum&(j&))
NEXT j&

Try this:
DIM aMacLong&, anIntelLong&
FOR j& = 1 TO gazillion&
 anIntelLong&=intelNum&(j&)
 POKE @aMacLong&,   PEEK(@anIntelLong&+3)
 POKE @aMacLong&+1, PEEK(@anIntelLong&+2)
 POKE @aMacLong&+2, PEEK(@anIntelLong&+1)
 POKE @aMacLong&+3, PEEK(@anIntelLong&)
 macNum&(j&)=aMacLong&
NEXT j&
The in-line version is more than twice as fast as calling your functions, because it does not incur the overhead of FN calls. The code is however ugly and hard to follow; your functions are not.
Robert Purves

Thanks for your improvements. I am so accustomed to using CLEAR LOCAL that I had quite forgotten about the overhead....

Taking your advice one step further, I suppose this would be even faster:
  FOR j& = 1 TO gazillion&
    POKE @macNum&(j&),   PEEK(@intelNum&(j&)+3)
    POKE @macNum&(j&)+1, PEEK(@intelNum&(j&)+2)
    POKE @macNum&(j&)+2, PEEK(@intelNum&(j&)+1)
    POKE @macNum&(j&)+3, PEEK(@intelNum&(j&))
  NEXT j&
Do you think that in FB^3 there would be any speed gains using PPC assembler? (Not that I have the foggiest idea how to do that!)
Michael Evans

Very little. A for-next loop with a bunch of POKEs & PEEKs compiles to pretty efficient code. If "use register variables" is on, and j& and gazillion& are in registers, most of your overhead is going to be in moving data to and from the array. At a guess, Andy's code to do that is going to be about as good as you'll be able to do in "raw" assembler. The next "big step" in speed would be to forget the array and use the addresses directly. Get @macNum&(0) and @intelNum&(0) and do all your PEEKs and POKEs to offsets from those addresses, instead of having to locate the address of each element of both arrays inside the loop.

Bill

As an old PPC assembly programmer, let me be the first to disagree with Bill! FB^3 provides a nice tool for experimentation, by allowing a mix of assembler and high level statements. Furthermore, and tactfully expressed, the PPC code produced by the compiler gives considerable scope for optimization.

To motivate others to explore PPC assembler, the code in the ready-to-run FB^3 program below turns out to be 20-40 times faster than Michael's originally posted method. On an iMac, it does 25-30 million byte-order reversals per second.

Some tricks of the trade are not easily available to a compiler writer. The loads are done with "indexed update" instructions lwzux or lhzux. For the LONG case, a special PPC instruction is used for the reversal:
` lwzux r3,r5,r6; load long int
` stwbrx r3,0,r4; store byte-reversed
The corresponding SHORT case has to be done less elegantly, because of an oversight in the assembler (reported to Staz).
'---------A complete FB^3 program-----------------
_nMax = 3000000' for BIG arrays. Reduce if out-of-memory
DIM gMacLongs&(_nMax), gMacWords%(_nMax)
END GLOBALS

#IF cpuPPC' 68K not allowed
REGISTER OFF
LOCAL FN LONGByteRev(srcPtr&, destPtr&, numLongs&)
'Reads numLongs& 4-byte chunks starting at address srcPtr&.
'Stores them, byte-reversed, starting at address destPtr&.
'If srcPtr&=destPtr& the reversal occurs in place,
' overwriting the original.
LONG IF numLongs&>0
` lwz r6,^numLongs&
` mtspr ctr,r6; loop count
` addi r6,0,4; li r6,4
` lwz r5,^srcPtr&
` addi r5,r5,$FFFC; subi r5,r5,4
` lwz r4,^destPtr&
`lLoop
` lwzux r3,r5,r6; load long int
` stwbrx r3,0,r4; store byte-reversed
` addi r4,r4,4
` bc 16,0,lLoop; bdnz lLoop
END IF
END FN

REGISTER OFF
LOCAL FN SHORTByteRev(srcPtr&, destPtr&, numWords&)
'Reads numWords& 2-byte chunks starting at address srcPtr&.
'Stores them, byte-reversed, starting at address destPtr&.
'If srcPtr&=destPtr& the reversal occurs in place,
' overwriting the original.
LONG IF numWords&>0
` lwz r6,^numWords&
` mtspr ctr,r6; loop count
` addi r6,0,2; li r6,2
` lwz r5,^srcPtr&
` addi r5,r5,$FFFE; subi r5,r5,2
` lwz r4,^destPtr&
`wLoop
` lhzux r3,r5,r6; load half (i.e. SHORT)
//` sthbrx r3,0,r4  ;<--BUG   assembles wrongly as sthbrx r0,r3,r4
` stb r3,(r4); therefore store..
` srawi r3,r3,8; ..as two..
` stb r3,1(r4); ..reversed bytes

` addi r4,r4,2
` bc 16,0,wLoop; bdnz wLoop
end if
END FN
#ENDIF

DIM j&,now&
WINDOW 1
FOR j& = 0 to _nMax - 1' make _nMax example values
gMacLongs&(j&) = &H01020304
gMacWords%(j&) = &H0102
NEXT

now& = FN TICKCOUNT
FN LONGByteRev(@gMacLongs&(0), @gMacLongs&(0), _nMax)
now& = FN TICKCOUNT - now&
PRINT int(1e9*now&/60.0/_nMax) " ns each 4-byte variable"

now& = FN TICKCOUNT
FN SHORTByteRev(@gMacWords%(0), @gMacWords%(0), _nMax)
now& = FN TICKCOUNT - now&
PRINT int(1e9*now&/60.0/_nMax) " ns each 2-byte variable"

DO
HANDLEEVENTS
UNTIL FN BUTTON
'----------------------------------------------------
Robert Purves