Tag Archives: Pentium

FPUs compared — 1994 and 10 years later

A few days ago I pulled my NeuroControl Workbench (software I wrote for my M.Sc. project ) off some old dusty backup CDs. I found this screenshot of Neural Net training.

Training ANN on Pentium 100MHz

ANN on Pentium 100MHz (47640 conn/sec)

I was curious how fast the same training would be on my current computer, which is already a few years old and has Athlon 64 San Diego 4000+. Here are the results. (BTW, I was surprised that I was able to run an old MS-DOS program with graphic output and overlay memory management without too much difficulty under Windows XP.)

Training ANN on Athlon 64 4000+

ANN on Athlon 64 4000+ (9172000 conn/sec)

So the difference in speed is 47640 connections/second vs. 9172000 conn/sec — this is over 190 times faster.

Pentium 100MHz, which as far as I remember is the CPU used for the old run, was released in March 1994. Athlon 64 4000+ was released in October 2004. So we have a time difference of about 10 years and speed difference of about 200. Thus the speed of FPUs over that period doubled on average about every 15 months.

(The speed increase is certainly also due to larger on-CPU caches. Pentium 100MHz has only 16KB L1 cache, while Athlon 64 4000+ has 128KB L1 and 1MB L2 cache. The whole NCWB application can fit into the L2 cache on the Athlon CPU.)

BTW, I initially wrote the software on a computer with 387 Cyrix FPU coprocessor. The ANN training speed in conn/sec was about 8000. This was already after I optimized most of the ANN computation by doing all multiplications in inlined assembler. FPUs are, or at least 387 was, very easy to program in assembler because of their stack architecture. Here is a sample of the inlined assembler code that I used:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
void Neurode::sumUpError(Neurode far *nextLayer)
{
  asm push   ds // nextLayer
 
  asm fldz   // err := 0
             // is summed on the bottom of the stack
  asm les    bx,this
  asm mov    cx,WORD PTR es:[bx].(Neurode)numOutgoing // iteration counter
 
  asm jcxz   finish // finish if count==0
 
  // calculate offset into nextLayer.ingoing array
  // corresponding to the the connection of this node with next layer
  asm mov    ax,WORD PTR es:[bx].(Neurode)posInLayer
  asm mov    si,sizeConnection // multiply AX (posInLayer) with sizeConnection
  asm mul    si
  asm mov    si,ax // SI is index into connection array
 
  asm lds    di, DWORD PTR nextLayer // get pointer to next layer
dosum:// loop over outgoing
  asm fld    DWORD PTR ds:[di].(Neurode)err // load nextLayer[i].err
  asm les    bx, DWORD PTR ds:[di].(Neurode)ingoing // get pointer to nextLayer[i].ingoing
  asm fmul   DWORD PTR es:[bx+si].(Connection)weight // multiply by connecting weight
  asm faddp  ST(1),ST(0) // add err*weight to error sum, pop it
  asm add    di,sizeNeurode // increment index into nextLayer array
  asm loop   dosum // repeat loop if not finished
 
finish:
  asm les    bx,this
  asm fstp   DWORD PTR es:[bx].(Neurode)err // summed-up error is returned
 
  asm pop    ds
}

My NeuroControl Worbench can output screenshots in PCX format. I was surprised that neither PhotoShop Elements nor Gimp was able to read these files correctly. Finally, I found that IrfanView was able to open and convert these PCX files correctly.