The program "mandel.c" creates a visualization of the Mandelbrot fractal. It uses an algorithm with three nested loops, with the inner loop containing several 32x32=64 multiplies. Our 32-bit x86 compiler doesn't know how to do that kind of multiplication (x86 supports it, but only if the destination registers are EDX:EAX), so it has to call off to a library function. On x64, the native word size is 64 bits, so it's easy to do a 32x32=64 multiply. Therefore, we can do the multiplication inline, and it participates in register allocation, producing better code overall. With size optimizations (-OS -Omax), the code for "mandelbrot" takes up 374 bytes (135 instructions) on 32-bit x86, and only 227 bytes (69 instructions) on x64. The difference is even more marked with speed optimizations (-Ospeed -Omax), because the two biggest optimizations that trade size for speed are function inlining and loop unrolling. Since the 32-bit "mandelbrot" contains a function call inside a nested loop, both of these optimizations hit hard. The x64 version is 792 bytes to x86's 2412 bytes --- a factor of three difference in code size! We have not run speed benchmarks on x64 code, but this size difference seems significant. Sample command lines for producing pretty pictures: ./a.out -0.7005 0.295 0.015 ; animate mandel.pgm The idea of using a fixed-point Mandelbrot calculation to demonstrate 64-bit code optimization comes from this article by Mike Wall on AMD.com: http://developer.amd.com/articlex.jsp?id=58