A few days ago I pulled my NeuroControl Workbench (software I wrote for my M.Sc. project ) off some old dusty backup CDs. I found this screenshot of Neural Net training.
I was curious how fast the same training would be on my current computer, which is already a few years old and has Athlon 64 San Diego 4000+. Here are the results. (BTW, I was surprised that I was able to run an old MS-DOS program with graphic output and overlay memory management without too much difficulty under Windows XP.)
So the difference in speed is 47640 connections/second vs. 9172000 conn/sec — this is over 190 times faster.
Pentium 100MHz, which as far as I remember is the CPU used for the old run, was released in March 1994. Athlon 64 4000+ was released in October 2004. So we have a time difference of about 10 years and speed difference of about 200. Thus the speed of FPUs over that period doubled on average about every 15 months.
(The speed increase is certainly also due to larger on-CPU caches. Pentium 100MHz has only 16KB L1 cache, while Athlon 64 4000+ has 128KB L1 and 1MB L2 cache. The whole NCWB application can fit into the L2 cache on the Athlon CPU.)
BTW, I initially wrote the software on a computer with 387 Cyrix FPU coprocessor. The ANN training speed in conn/sec was about 8000. This was already after I optimized most of the ANN computation by doing all multiplications in inlined assembler. FPUs are, or at least 387 was, very easy to program in assembler because of their stack architecture. Here is a sample of the inlined assembler code that I used:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | void Neurode::sumUpError(Neurode far *nextLayer) { asm push ds // nextLayer asm fldz // err := 0 // is summed on the bottom of the stack asm les bx,this asm mov cx,WORD PTR es:[bx].(Neurode)numOutgoing // iteration counter asm jcxz finish // finish if count==0 // calculate offset into nextLayer.ingoing array // corresponding to the the connection of this node with next layer asm mov ax,WORD PTR es:[bx].(Neurode)posInLayer asm mov si,sizeConnection // multiply AX (posInLayer) with sizeConnection asm mul si asm mov si,ax // SI is index into connection array asm lds di, DWORD PTR nextLayer // get pointer to next layer dosum:// loop over outgoing asm fld DWORD PTR ds:[di].(Neurode)err // load nextLayer[i].err asm les bx, DWORD PTR ds:[di].(Neurode)ingoing // get pointer to nextLayer[i].ingoing asm fmul DWORD PTR es:[bx+si].(Connection)weight // multiply by connecting weight asm faddp ST(1),ST(0) // add err*weight to error sum, pop it asm add di,sizeNeurode // increment index into nextLayer array asm loop dosum // repeat loop if not finished finish: asm les bx,this asm fstp DWORD PTR es:[bx].(Neurode)err // summed-up error is returned asm pop ds } |
My NeuroControl Worbench can output screenshots in PCX format. I was surprised that neither PhotoShop Elements nor Gimp was able to read these files correctly. Finally, I found that IrfanView was able to open and convert these PCX files correctly.