[kaffe] Slow byte to char conversion

Godmar Back kaffe@rufus.w3.org
Fri, 18 Aug 2000 21:39:40 -0600 (MDT)



From what I understand, and someone correct me if I'm wrong,
there shouldn't be any reason not to include the change you suggest -
if someone implements it, of course.

If I understand your proposal right, you'd use an array for
the first 256 values and a hashtable or something like that 
for the rest.  I don't think there would be a problem with changing 
it so that it would both serialize an array and a hashtable.
One or two objects in *.ser shouldn't make a difference. 
You could even stick a flag at the beginning if the array shouldn't
pay off for some encodings.
One would have to see what the actual sizes of the .ser files would be;
keeping those small is certainly desirable.  From what I understand,
they're more compact than any Java code representation.
Edouard would know more since he wrote that code, I think.


On a related note, this whole conversion thing stinks.
Why can't people stick to 7-bit ASCII?
For instance, the JVM98 jack benchmark calls PrintStream.print
a whopping 296218 times in a single run.  Every call results in a new 
converter object being newinstanced, just to convert a bunch of bytes. 
(The new converter was one of the changes done to make the
charset conversion thread-safe.)  This is one of the reasons
why we're on this test some 7 or 8 times slower than IBM.
And that's not even using any of the serialized converters, just 
the default one (which is written in JNI).

	- Godmar

> 
> 
> Hi,
> 
> I wrote a simple program to show a Java charmap (
> something like Encode.java in developers directory).
> It essentially creates a byte array with size 1, and
> creates a string with the appropriate Unicode char
> using the encoding in question for every value a byte
> can take.
> 
> When displaying a serialized converter like 8859_2,
> the performance is very bad. Comparing current kaffe
> from CVS running on SuSE Linux 6.4 with jit3 and IBM's
> JRE 1.3 running in interpreted mode, kaffe is about 10
> times slower.
> 
> While I consider the idea to use serialized encoders
> based on hashtables a great one, it is very
> inefficient for ISO-8859-X and similar byte to char
> encodings. These encodings use most of the 256
> possible values a byte can take to encode characters,
> so I tried using an array instead. I achieved
> comparable running times to JRE 1.3.
> 
> Why was the hashtable based conversion chosen over
> alternatives (switch based lookup, array based
> lookup)?
> 
> Dali
> 
> =====
> "Success means never having to wear a suit"
> 
> __________________________________________________
> Do You Yahoo!?
> Send instant messages & get email alerts with Yahoo! Messenger.
> http://im.yahoo.com/
>