FidoNet Echomail Archive

<<< Previous Index Next >>>

From: Janis Kracht
To: All
Date: 2004-01-17 21:35:44
Subject: Nine Language Performance Round-up: Benchmarking Math & File I/O (2)

[continued from previous message]

Benchmark design

Designing good, helpful benchmarks is fiendishly difficult. This fact led
me to keep the scope of this benchmark quite limited. I tested only math
operations (32-bit integer arithmetic, 64-bit integer arithmetic, 64-bit
floating point arithmetic, and 64-bit trigonometry), and file I/O with
sequential access. The tests were not comprehensive by any stretch of the
imagination; I didn't test string manipulation, graphics, object creation
and management (for object oriented languages), complex data structures,
network access, database access, or any of the countless other things that
go on in any non-trivial program. But I did test some basic building blocks
that form the foundation of many programs, and these tests should give a
rough idea of how efficiently various languages can perform some of their
most fundamental operations.

Here's what happens in each part of the benchmark:

32-bit integer math: using a 32-bit integer loop counter and 32-bit integer
operands, alternate among the four arithmetic functions while working
through a loop from one to one billion. That is, calculate the following
(while discarding any remainders):

    1 - 1 + 2 * 3 / 4 - 5 + 6 * 7 / 8 - ... - 999,999,997 + 999,999,998 *
    999,999,999 / 1,000,000,000

64-bit integer math: same algorithm as above, but use a 64-bit integer loop
counter and operands. Start at ten billion and end at eleven billion so the
compiler doesn't knock the data types down to 32-bit. 64-bit floating point
math: same as for 64-bit integer math, but use a 64-bit floating point loop
counter and operands. Don't discard remainders. 64-bit floating point
trigonometry: using a 64-bit floating point loop counter, calculate sine,
cosine, tangent, logarithm (base 10) and square root of all values from one
to ten million. I chose 64-bit values for all languages because some
languages required them, but if a compiler was able to convert the values
to 32 bits, I let it go ahead and perform that optimization. I/O: Write one
million 80-character lines to a text file, then read the lines back into

At the end of each benchmark component I printed a value that was generated
by the code. This was to ensure that compilers didn't completely optimize
away portions of the benchmarks after seeing that the code was not actually
used for anything (a phenomenon I discovered when early versions of the
benchmark returned bafflingly optimistic results in Java 1.4.2 and Visual
C++). But I wanted to let the compilers optimize as much as possible while
still ensuring that every line of code ran. The optimization settings I
settled on were as follows:

Java 1.3.1: compiled with javac -g:none -O to exclude debugging information
and turn on optimization, ran with java -hotspot to activate the
just-in-time compiler within the JVM.
Java 1.4.2: compiled with javac -g:none to exclude debugging information,
ran with java -server to use the slower-starting but faster-running server
configuration of the JVM.
C: compiled with gcc -march=pentium4 -msse2 -mfpmath=sse -O3 -s -mno-cygwin
to optimize for my CPU, enable SSE2 extensions for as many math operations
as possible, and link to Windows libraries instead of Cygwin libraries.
Python with and without Psyco: no optimization used. The python -O
interpreter flag optimizes Python for fast loading rather than fast
performance, so was not used.
Visual Basic: used "release" configuration, turned on
"optimized," turned off "integer overflow checks"
within Visual Studio. Visual C#: used "release" configuration,
turned on "optimize code" within Visual Studio.
Visual C++: used "release" configuration, turned on "whole
program optimization," set "optimization" to "maximize
speed," turned on "global optimizations," turned on
"enable intrinsic functions," set "favor size or speed"
to "favor fast code," set "omit frame pointers" to
"yes," set "optimize for processor" to "Pentium 4
and above," set "buffer security check" to "no,"
set "enable enhanced instruction set" to "SIMD2," and
set "optimize for Windows98" to "no" within Visual
Studio. Visual J#: used "release" configuration, turned on
"optimize code," turned off "generate debugging
information" within Visual Studio.

All benchmark code can be found at my website. The Java benchmarks were
created with the Eclipse IDE, but were compiled and run from the command
line. I used identical source code for the Java 1.3.1, Java 1.4.2, and
Visual J# benchmarks. The Visual C++ and gcc C benchmarks used nearly
identical source code. The C program was written with TextPad, compiled
using gcc within the Cygwin bash shell emulation layer for Windows, and run
from the Windows command line after quitting Cygwin. I programmed the
Python benchmark with TextPad and ran it from the command line. Adding
Psyco's just-in-time compilation to Python was simple: I downloaded Psyco
from Sourceforge and added import psyco and psyco.full() to the top of the
Python source code. The four Microsoft benchmarks were programmed and
compiled within Microsoft Visual Studio .NET 2003, though I ran each
program's .exe file from the command line.

It should be noted that the Java log() function computes natural logarithms
(using e as a base), whereas the other languages compute logarithms using
base 10. I only discovered this after running the benchmarks, and I assume
it had little or no effect on the results, but it does seem strange that
Java has no built-in base 10 log function.

Before running each set of benchmarks I defragged the hard disk, rebooted,
and shut down unnecessary background services. I ran each benchmark at
least three times and used the best score from each component, assuming
that slower scores were the result of unrelated background processes
getting in the way of the CPU and/or hard disk. Start-up time for each
benchmark was not included in the performance results. The benchmarks were
run on the following hardware:

Type: Dell Latitude C640 Notebook
CPU: Pentium 4-M 2GHz
RAM: 768MB
Hard Disk: IBM Travelstar 20GB/4500RPM Video: Radeon Mobility 7500/32MB
OS: Windows XP Pro SP 1
File System: NTFS


--- BBBS/LiI v4.01 Flag-5
 * Origin: Prism bbs (1:261/38)
SEEN-BY: 633/267 270
@PATH: 261/38 123/500 106/2000 633/267

<<< Previous Index Next >>>