You're probably right, Michael. We zero blockInts before going in select, which effectively reenables interrupts and context switches. However, the fd_set that says whether sigPipe is readable is thread-local. I'll think about it later. One possibility may be to keep track of the bytes written in the pipe and only read from the pipe if that number's greater than zero. But making the pipe non-blocking as you suggest may work too. - Godmar > > Godmar (or anybody familiar with the jthread internals), > I think I found a bug in the jthread scheduling mechanism which causes > kaffe to block. Below I attach an execution trace that blocks. Note: I > modified kaffe to work in a slave configuration: instead of the main > thread I have 3 threads in the trace below: a dispatcher thread > waiting for job requests on a socket and 2 workers that actually do the > jobs (D - dispatcher or main, W1, W2 - workers). > Kaffe blocks because "handleIO" tries to read from sigPipe twice even > though only one byte is written. Please go through the trace to understand > why it happens. I ignored everything not relevant to this problem in the > trace. > > ************************************************************************* > > W2: send something on socket > W2: wait on condvar for dispatcher signal > reschedule other thread > .. > W1: send something on socket > W1: wait on condvar for dispatcher signal > SIGIO: IO reply for W2 arrives > ints are blocked so write to sigPipe[1] > W1: enter reschedule, handleIO(true) (no runnable thread) > add sigPipe[0] to pending read fds > enter select with infinite wait time > select returns 2 (sigPipe[0] and W2's reply) > interrupted by SIGVTALRM > SIGVTALRM: reschedule the dispatcher D > D: read from socket > signal worker W2 > W2: do Java stuff > send something on socket > wait on condvar for IO reply > enter reschedule, handleIO(true) > (no runnable thread, W1 still waiting in condvar) > add sigPipe[0] to pending read fds > enter select with infinite wait time > select returns 1 (sigPipe still has a byte in it) > EMPTY SIGPIPE! > reenter handleIO(true) > reenter select with infinite wait time > select returns 1 (W1's reply) > SIGIO: reschedule the dispatcher D > D: read from socket > signal worker W1 > W1: (W1 is right after select in handleIO(true)) > select returned with FD_ISSET(sigPipe[0]) > try to read from sigPipe[0] > BLOCK IN READ, sigPipe previously emptied by W2! > > ************************************************************************* > > To solve the problem I just made sigPipe[0] non-blocking using > "jthreadedFileDescriptor". > Sorry I can't provide a real test program. My code is not stable enough to > be released. Nevertheless, one could write a test program just for this > problem, I just don't have the time for this. > > Regards, > Mihai Surdeanu Southern Methodist University, CSE > (214) 768 - 3054 > >