[FIXED] Seeking debug tips for Kaffe on NSPR

Tue May 26 12:36:25 PDT 1998

Godmar Back writes:
 > I took the liberty to fix these two things in the current CVS repos.
 > I added fstat, and included jsyscall.h in findInJar.c

Cool, though I'm not sure this completely closes the matter (there's
similar code in jar.c which calls stdio!)  Do you know when a snapshot
with the changes will show up for public ftp?

Also, fstat is probably not the only thing missing from the syscall
interface; process spawning needs attention (right now, I don't even
try to support it).  The problem is fork() --- NSPR doesn't export any
such functionality for the excellent reason that it's very difficult
to support well on Windows; instead, they have a one-step process
spawning function.  I *believe* that handling process creation
properly through the syscall interface would also allow fixfd to be
eliminated, which is all to the good; fixfd violates abstraction by
design.

 > Clearly, calling walkConservative and markObject from the threading
 > system violates encapsulation.   On the other hand, these two functions
 > are specific to the current mark-and-sweep collector; not that I say
 > that we will have a proliferation of different garbage collectors for
 > Kaffe in the near future, but I think a better solution might be to
 > require the threading system to provide an interface that allows
 > a caller to query the hot area of a suspended thread's stack so that
 > it can implement a conservative gc.

The thread interface structure already has a GcWalkThreads entry which
is supposed to walk the stacks (and registers) of the running threads;
that's the code I have which calls walkConservative.  The question
is how it finds walkConservative to call it.  Right now, the name is
just wired in, which can't be good; it would be better to either have
it in the GC dispatch table, or to pass a function pointer in as an
argument to GcWalkThreads (as in the NSPR GC hooks).

 > In the jthreads version, you'll notice that only internal.c depends
 > on VM internals.  I went to great lengths to be able to compile (and test)
 > jthread.c independently.  jthread.c does not know anything about 
 > Hjava_lang_Thread, for instance, which it simply treats as thread-specific 
 > data.  Ditto for markObject, gc_malloc, etc.
 > I have always felt that the division between internal.c and jthread.c, 
 > defined by the interface in jthread.h, might be closer to what the actual 
 > interface between VM and threading system should be.  

That sounds about right; it's a bit bothersome to see the platform
dependant code doing a notify on an Hjava_lang_thread (to implement
join), which *could* be done once for all platforms at the next level
up.  But that whole strategy is worrisome anyway --- I'm not aware of
anything in the Java language spec which keeps user-level code from
doing their own waits and notifies on Thread (or subclassed Thread)
objects, and that might interfere with Kaffe's use of those monitors
to implement Thread.join().

 > While doing that, I also
 > looked at some pthread packages, such as Provenzano's that comes
 > with libc_r in FreeBSD, and various Linux packages--the truth is
 > that it's a mess, and I haven't found one package that supports
 > everything I need.  Hardly anybody supports pthread_cancel
 > properly, which is needed for Thread.stop().  And I'm not even
 > talking about the 3-4 versions of the actual "standard" that fly
 > around.

Hmmmm... I would have thought pthread_kill much more appropriate for
implementing Thread.stop() than pthread_cancel.  (Thread.stop() is
just supposed to cause a ThreadDeath exception to be thrown, but the
target thread is allowed to catch the exception, spend an arbitrary
amount of time dealing with it, and even discard it.  The whole effect
is very much like sending SIGQUIT to a Unix process.  Does
pthread_cancel give the target thread a chance to process the event?)

 > Finally, not wanting to get into a flame wars here or something:
 > 
 > > There is a chicken-and-egg problem here --- what rational person is
 > > going to *show up* at the bazaar if the most interesting merchandise
 > > is not on view?  (The most salient distinction between "cathedral" and
 > > "bazaar" models in that paper isn't the form of control of development
 > > --- Eric had final control of the contents of all fetchmail releases,
 > > and makes no bones about it --- but rather, the release of potentially
 > > interesting, buggy and incomplete code early).  And, on that cheery
 > > note...
 > 
 >  My comparison probably wasn't the best comparison--Eric didn't work on
 > fetchmail for a living (or did he?).  In Transvirtual's case, they have 
 > to set their priorities from a business point of view.  But I concur
 > that there is somewhat of a chicken-and-egg problem.

Certainly, Transvirtual has to keep their business priorities in mind,
but it's not clear to me what business priorities might be served by
keeping their current draft code to themselves.  Transvirtual might
get some quality free help if they put the stuff out there, even in an
incomplete state, for people who have an interest.  If they do get
help, it's obviously to their benefit.  And if they commit the stuff
into their snapshots and nobody starts to do something useful with it,
I'm not sure what they've lost.

I don't *think* Eric was getting paid for fetchmail, but I'm not sure
how that's relevant.

rst