[kaffe] Slow byte to char conversion

Dalibor Topic kaffe@rufus.w3.org
Mon, 28 Aug 2000 22:38:09 +0200


Am Mon, 28 Aug 2000 schrieb Artur Biesiadowski:

> And why exactly default converter could not be cached and same instance
> used for all conversions ? I think it is stateless class, so it should
> be safe to enter same object method from various threads with all state
> on stack.

It depends on the encoding. Let's say you have a multibyte encoding,
where several bytes encode a single character, like UTF-8 [1]. You
can't guarantee that all the byte arrays that you want to encode into
char arrays terminate on character boundaries. So you need to be
able to save the state of your converter and pick up at the position
where you left next time your converter is called.

Imagine that you're reading in a UTF-8 encoded file, and get an
IOException while you're reading it. You convert as much as you've
read, but you can't decide on the last character, since your stream has
been interrupted. The UTF-8 converter saves its state, and waits for
bytes to convert to characters.

Now, imagine another thread tries to do some UTF-8 input
conversion, too. If it used the first converter, it would get a
corrupted result, since the first converter is still waiting for bytes
to continue converting. So you have to use a fresh UTF-8 converter for
that.

You could say: "So? Kaffe uses ISO-Latin-1 as default encoding. That's
stateless.". But unfortunately the default encoding comes from the
file.encoding system property, which can be changed by the user [2].
Don't rely on the default encoding being ISO-Latin-1.

Kaffe does some sort of caching already, but it instantiates
a new converter every time one is needed, which is not necessary for
stateless converters, as you've pointed out.

[1] If you have a Linux installation around, take a look at
/usr/share/i18n/charmaps/UTF8. It might have a slightly different name
on your installation, though, since character encodings usually have
several aliases. 

[2] Well, sort of. While Java 2 allows system properties to be set,
kaffe has not caught up with that yet, as far as I know. So the only
way I know of to change the default encoding is to modify it in
libraries/clib/native/System.c and recompile kaffe.


__________________________________________________
Do You Yahoo!?
Talk to your friends online with Yahoo! Messenger.
http://im.yahoo.com