[zeromq-dev] Race condition in cppzmq zmq::socket

Discussion:

[zeromq-dev] Race condition in cppzmq zmq::socket_t::close() / cppzmq tag?

Eric Pederson

2016-12-14 07:24:46 UTC

I ran into a race condition in zmq::socket_t::close() in cppzmq:

*inline void *close() ZMQ_NOTHROW
{
*if*(ptr == NULL)

*// already closed **return *;

*// Someone can sneak in here and close the socket*
*int *rc = zmq_close (ptr);
ZMQ_ASSERT (rc == 0);
ptr = 0 ;
}

I worked around it using compare-and-swap to atomically check if it's
already closed. Does that kind of fix sound reasonable for
zmq::socket_t::close()?

Also I wanted to follow up on my previous question about tagging cppzmq. I
think that conversation got lost in the list shutdown that happened
recently. Does it make sense to regularly tag the cppzmq repo with the
corresponding ZeroMQ version?

Thanks,

-- Eric

Luca Boccassi

2016-12-14 08:36:41 UTC

Permalink

That could only happen if the same socket was used or managed from multiple
threads. As the documentation clearly states, sockets are intentionally not
thread safe and must not be used from multiple threads.

Regarding the version I'm not sure as it's never been tagged, and the
development usually lags a bit behind so it would risk being misleading.

It would make sense to be versioned only if it was in the same repo I think.

On Dec 14, 2016 07:25, "Eric Pederson" <***@gmail.com> wrote:

I ran into a race condition in zmq::socket_t::close() in cppzmq:

*inline void *close() ZMQ_NOTHROW
{
*if*(ptr == NULL)

*// already closed **return *;

*// Someone can sneak in here and close the socket*
*int *rc = zmq_close (ptr);
ZMQ_ASSERT (rc == 0);
ptr = 0 ;
}

I worked around it using compare-and-swap to atomically check if it's
already closed. Does that kind of fix sound reasonable for
zmq::socket_t::close()?

Also I wanted to follow up on my previous question about tagging cppzmq. I
think that conversation got lost in the list shutdown that happened
recently. Does it make sense to regularly tag the cppzmq repo with the
corresponding ZeroMQ version?

Thanks,

-- Eric

Eric Pederson

2016-12-15 07:39:54 UTC

Permalink

Right - normally I would not access a socket from two threads.

The scenario is unit testing. We have a class that encapsulates a set of
sockets and also a thread with a loop that polls the sockets.

When the test is over, I do the following to clean it up: close the
context so the poll blocked in the thread throws an exception with ETERM.
I close the sockets after catching the exception and returning from the
thread. The destructor joins the thread so the object doesn't get
destroyed before the thread is finished. Then the destructor continues and
the socket members get destroyed. All this works fine normally because the
sockets are usually closed before the socket destructor runs. But
sometimes the closing in the thread would race with the destruction.

I found that it is because some tests were not joining the thread but
instead detaching it. So it looks like we have a nice deterministic
shutdown process finally.

Thanks,

That could only happen if the same socket was used or managed from multiple

Post by Luca Boccassi
threads. As the documentation clearly states, sockets are intentionally not
thread safe and must not be used from multiple threads.

-- Eric

Post by Luca Boccassi
*inline void *close() ZMQ_NOTHROW
{
*if*(ptr == NULL)
*// already closed **return *;
*// Someone can sneak in here and close the socket*
*int *rc = zmq_close (ptr);
ZMQ_ASSERT (rc == 0);
ptr = 0 ;
}
I worked around it using compare-and-swap to atomically check if it's
already closed. Does that kind of fix sound reasonable for
zmq::socket_t::close()?
Also I wanted to follow up on my previous question about tagging cppzmq.
I think that conversation got lost in the list shutdown that happened
recently. Does it make sense to regularly tag the cppzmq repo with the
corresponding ZeroMQ version?
Thanks,
-- Eric

Luca Boccassi

2016-12-15 09:32:11 UTC

Permalink

Note that as the documentation also says you must close the sockets before
destroying the context, not after

Post by Eric Pederson
Right - normally I would not access a socket from two threads.
The scenario is unit testing. We have a class that encapsulates a set of
sockets and also a thread with a loop that polls the sockets.
When the test is over, I do the following to clean it up: close the
context so the poll blocked in the thread throws an exception with ETERM.
I close the sockets after catching the exception and returning from the
thread. The destructor joins the thread so the object doesn't get
destroyed before the thread is finished. Then the destructor continues and
the socket members get destroyed. All this works fine normally because the
sockets are usually closed before the socket destructor runs. But
sometimes the closing in the thread would race with the destruction.
I found that it is because some tests were not joining the thread but
instead detaching it. So it looks like we have a nice deterministic
shutdown process finally.
Thanks,
That could only happen if the same socket was used or managed from multiple

Post by Luca Boccassi
threads. As the documentation clearly states, sockets are intentionally not
thread safe and must not be used from multiple threads.

-- Eric

_______________________________________________
zeromq-dev mailing list
https://lists.zeromq.org/mailman/listinfo/zeromq-dev

Eric Pederson

2016-12-15 16:32:45 UTC

Permalink

Post by Luca Boccassi
Note that as the documentation also says you must close the sockets before
destroying the context, not after

If I closed the sockets that were being used in another thread then I would
definitely be in danger of race conditions.

The documentation says to close the sockets in your thread, then close the
context. Then catch the error in the other thread and close the sockets in
that thread.

The section:

First, do not try to use the same socket from multiple threads. Please

Post by Luca Boccassi
don't explain why you think this would be excellent fun, just please don't
do it. Next, you need to shut down each socket that has ongoing requests.
The proper way is to set a low LINGER value (1 second), and then close the
socket. If your language binding doesn't do this for you automatically when
you destroy a context, I'd suggest sending a patch.
Finally, destroy the context. This will cause any blocking receives or
polls or sends in attached threads (i.e., which share the same context) to
return with an error. Catch that error, and then set linger on, and *close
sockets in that thread*, and exit.

So I'm doing the dance exactly as the documentation suggests. There is no
getting around that the destructor will run in a separate thread from the
one that some sockets were running in. But as long as you wait for that
thread to end, and those sockets will be closed anyway, you should be safe
to run the destructor.

All that said it's probably a good idea to have atomic closing to make life
safer. I read that there are thread-safe sockets on the horizon
(ZMQ_CLIENT and ZMQ_SERVER)? It will be useful for those.

-- Eric

Post by Luca Boccassi
Right - normally I would not access a socket from two threads.
The scenario is unit testing. We have a class that encapsulates a set of
sockets and also a thread with a loop that polls the sockets.
When the test is over, I do the following to clean it up: close the
context so the poll blocked in the thread throws an exception with ETERM.
I close the sockets after catching the exception and returning from the
thread. The destructor joins the thread so the object doesn't get
destroyed before the thread is finished. Then the destructor continues and
the socket members get destroyed. All this works fine normally because the
sockets are usually closed before the socket destructor runs. But
sometimes the closing in the thread would race with the destruction.
I found that it is because some tests were not joining the thread but
instead detaching it. So it looks like we have a nice deterministic
shutdown process finally.
Thanks,
That could only happen if the same socket was used or managed from multiple

Post by Luca Boccassi
threads. As the documentation clearly states, sockets are intentionally not
thread safe and must not be used from multiple threads.

-- Eric