[kaffe] Optimization?

Jukka Santala jsantala@tml.hut.fi
Sun, 26 May 2002 15:56:43 +0300 (EEST)


The hash-table lookup functions tend to get called a LOT in your typical
application, and should therefore be as fast-path as possible. In fact,
people generally assume the hash-lookups perform very fast. Here's a
simplistic optimization I did to this end, by folding the bucket & find
functions into one for the most common path. Altough Kaffe performs no
bytecode optimizations, this should at least save a couple of method calls
for the most common call at the cost of just little less execution cache
locality. The get() method could also have been rewritten the same way,
but consideration for cache locality made me stop here. Preferably, I
would write these in native code for optimizations, but more on that after
the example...

diff -u -r1.16 HashMap.java
--- HashMap.java        24 May 2002 14:46:43 -0000      1.16
+++ HashMap.java        26 May 2002 12:33:35 -0000
@@ -233,7 +233,14 @@
        }

        private Entry find(Object key) {
-               return find(key, bucket(key, table.length));
+               int hash = (key == null) ? 0 : key.hashCode();
+               hash &= 0x7fffffff;
+               for (Entry e = table[hash % table.length]; e != null; e =
e.next ) {
+                       if (key == null ? e.key == null : 
key.equals(e.key)) {
+                               return e;
+                       }
+               }
+               return null;
        }

        private Entry find(Object key, int bucket) {

Now, I can profile this on a number of platforms, but the problem is that
considering the number of platforms supported by Kaffe, there's no way for
me to experimentally try out every platform. Seeing as the automated
benchmarking suite discussed earlier hasn't come to anything yet, I'm
therefore seeking general opinions on the relative speeds of different
approaches on the platforms that matter for Kaffe...

So, how bad is a Java method call in Kaffe, really? Does it make sense to
fold critical function paths into one, even outside loops, or does the
increased code size pushing other code out of execution cache eat up any
possible benefits? How about code reuse, what kind of style guide-lines
exist (or should exist) for this?
How does this compare to a native method call? I've been under the
impression that JNI calls carry significant overhead, so would not
neccessarily be efficient for simple functions. On the other hand we have
to take into account that JNI code could be better optimized for the
target platform, but there's probably still something that could be said
about this performance.

Other comments on this kind of refactoring?

 -Jukka Santala