zmqdev
2016-11-25 09:37:24 UTC
* Background
I have a service that starts workers on demand with fork+exec.
The requests arrive over zeromq sockets.
After the fork, before the exec, I close all file descriptors > 2,
keeping only stdin/out/err. I then exec the requested program.
* Problem
It works. Except that I get some rare core dumps (of the service) with
the following assertion failure:
Bad file descriptor (src/epoll.cpp:90)
and the backtrace:
#0 0xf77f5430 in __kernel_vsyscall ()
#1 0xf743f1f7 in raise () from /lib/libc.so.6
#2 0xf7440a33 in abort () from /lib/libc.so.6
#3 0xf7067134 in zmq::zmq_abort(char const*) () from $LIBS/libzmq.so.5
#4 0xf7065e6c in zmq::epoll_t::rm_fd(void*) () from $LIBS/libzmq.so.5
#5 0xf7068823 in zmq::io_object_t::rm_fd(void*) () from
$LIBS/libzmq.so.5
#6 0xf70958af in zmq::stream_engine_t::unplug() () from
$LIBS/libzmq.so.5
#7 0xf7098711 in
zmq::stream_engine_t::error(zmq::stream_engine_t::error_reason_t) ()
from $LIBS/libzmq.so.5
#8 0xf7098867 in zmq::stream_engine_t::timer_event(int) () from
$LIBS/libzmq.so.5
#9 0xf707f972 in zmq::poller_base_t::execute_timers() () from
$LIBS/libzmq.so.5
#10 0xf7066209 in zmq::epoll_t::loop() () from $LIBS/libzmq.so.5
#11 0xf7066467 in zmq::epoll_t::worker_routine(void*) () from
$LIBS/libzmq.so.5
#12 0xf709d67e in thread_routine () from $LIBS/libzmq.so.5
#13 0xf7619b2c in start_thread () from /lib/libpthread.so.0
#14 0xf750808e in clone () from /lib/libc.so.6
This is with zeromq-4.1.4 on RHEL 7.3 x86_64.
So I wonder: is there some interaction between parent and child?
* Documentation
The Guide and the FAQ do not address explicitly the fork+exec point.
The question has been asked several times on the mailing list in various
forms, without a definitive answer (for dummies like me at least).
* Questions:
Do I need to zmq_close the sockets in the child?
Or is zmq_term in the child enough?
Does closing the file descriptors in the child cause problems in the parent?
What is the correct way to handle this?
I have a service that starts workers on demand with fork+exec.
The requests arrive over zeromq sockets.
After the fork, before the exec, I close all file descriptors > 2,
keeping only stdin/out/err. I then exec the requested program.
* Problem
It works. Except that I get some rare core dumps (of the service) with
the following assertion failure:
Bad file descriptor (src/epoll.cpp:90)
and the backtrace:
#0 0xf77f5430 in __kernel_vsyscall ()
#1 0xf743f1f7 in raise () from /lib/libc.so.6
#2 0xf7440a33 in abort () from /lib/libc.so.6
#3 0xf7067134 in zmq::zmq_abort(char const*) () from $LIBS/libzmq.so.5
#4 0xf7065e6c in zmq::epoll_t::rm_fd(void*) () from $LIBS/libzmq.so.5
#5 0xf7068823 in zmq::io_object_t::rm_fd(void*) () from
$LIBS/libzmq.so.5
#6 0xf70958af in zmq::stream_engine_t::unplug() () from
$LIBS/libzmq.so.5
#7 0xf7098711 in
zmq::stream_engine_t::error(zmq::stream_engine_t::error_reason_t) ()
from $LIBS/libzmq.so.5
#8 0xf7098867 in zmq::stream_engine_t::timer_event(int) () from
$LIBS/libzmq.so.5
#9 0xf707f972 in zmq::poller_base_t::execute_timers() () from
$LIBS/libzmq.so.5
#10 0xf7066209 in zmq::epoll_t::loop() () from $LIBS/libzmq.so.5
#11 0xf7066467 in zmq::epoll_t::worker_routine(void*) () from
$LIBS/libzmq.so.5
#12 0xf709d67e in thread_routine () from $LIBS/libzmq.so.5
#13 0xf7619b2c in start_thread () from /lib/libpthread.so.0
#14 0xf750808e in clone () from /lib/libc.so.6
This is with zeromq-4.1.4 on RHEL 7.3 x86_64.
So I wonder: is there some interaction between parent and child?
* Documentation
The Guide and the FAQ do not address explicitly the fork+exec point.
The question has been asked several times on the mailing list in various
forms, without a definitive answer (for dummies like me at least).
* Questions:
Do I need to zmq_close the sockets in the child?
Or is zmq_term in the child enough?
Does closing the file descriptors in the child cause problems in the parent?
What is the correct way to handle this?