Discussion:
[zeromq-dev] XPUB(or PUB) aborted by assertion failure (erased == 1 on mtrie.cpp)
Heungsub Lee
2016-12-12 19:03:04 UTC
Permalink
Was my email sent well? I sent this email yesterday, but I couldn't see at
the archive.

2016년 12월 12음 (월) 였전 3:55, Heungsub Lee <***@subl.ee>님읎 작성:

Hi folks, I'm Heungsub Lee.

I've been making a game server with ZeroMQ's Pub/Sub approach. I got a
critical problem by using PUB/SUB sockets. Sometimes my processes are
aborted with assertion failure from ZeroMQ:

Assertion failed: erased == 1 (src/mtrie.cpp:297)

I tried with pyzmq-16.0.2 over libzmq-4.2.0.

In my case, a SUB socket binds to an address then a PUB socket connects to
the address. All of PUB sockets and SUB sockets in a cluster connect with
each others. They makes a fully connected network among 500+ server
processes.

A SUB socket frequently subscribes or unsubscribes their topics. The
topics in a cluster grow up since the cluster started. At a moment when I
checked, one of SUB sockets is subscribing 3000+ topics.

I saw 3 aborting scenarios:

1. When a SUB socket closes, some PUB sockets abort. Perhaps it is a
concurrency bug from pyzmq what I'm using. I reproduced it by a test
case
<https://github.com/what-studio/pyzmq/commit/5159ee563a571daccf1285aa74917bb875c774a7>.
And I think I fixed it
<https://github.com/what-studio/pyzmq/commit/94ab0a88dbef7d0f33b34cdf18e55487735dde01>
.
2. When a PUB socket joins to a mature cluster it aborts almost
immediately. A mature cluster means there are already so many subscribing
topics and subscribe/unsubscribe synchronization messages.
3. A PUB socket on a weak host machine (e.g. AWS EC2 t2.medium),
sometimes aborts. I'm not sure what is the point.

Unfortunately, I couldn't reproduce the last 2 scenarios by a small code.
But my server still has been aborted.

The assertion failure occurs when a PUB socket tries to remove a pipe to a
SUB socket but there's no matched pipe. I'm wondering if ZeroMQ guarantees
the consistency of subscribe/unsubscribe synchronizations between busy PUB
and SUB sockets.

Regards,
Heungsub
Luca Boccassi
2016-12-12 21:36:49 UTC
Permalink
The mailing list had a problem with spam, it's fixed now, sorry for the
inconvenience.

Are you using a socket from multiple threads? That is usually the prime
cause of crashes.

Also, usually the PUB binds and the SUB connects, although it should
work the other way around as well.

In these cases a minimal code snippet that reproduces the problem is the
best way to get to a resolution.
Post by Heungsub Lee
Was my email sent well? I sent this email yesterday, but I couldn't see at
the archive.
Hi folks, I'm Heungsub Lee.
I've been making a game server with ZeroMQ's Pub/Sub approach. I got a
critical problem by using PUB/SUB sockets. Sometimes my processes are
Assertion failed: erased == 1 (src/mtrie.cpp:297)
I tried with pyzmq-16.0.2 over libzmq-4.2.0.
In my case, a SUB socket binds to an address then a PUB socket connects to
the address. All of PUB sockets and SUB sockets in a cluster connect with
each others. They makes a fully connected network among 500+ server
processes.
A SUB socket frequently subscribes or unsubscribes their topics. The
topics in a cluster grow up since the cluster started. At a moment when I
checked, one of SUB sockets is subscribing 3000+ topics.
1. When a SUB socket closes, some PUB sockets abort. Perhaps it is a
concurrency bug from pyzmq what I'm using. I reproduced it by a test
case
<https://github.com/what-studio/pyzmq/commit/5159ee563a571daccf1285aa74917bb875c774a7>.
And I think I fixed it
<https://github.com/what-studio/pyzmq/commit/94ab0a88dbef7d0f33b34cdf18e55487735dde01>
.
2. When a PUB socket joins to a mature cluster it aborts almost
immediately. A mature cluster means there are already so many subscribing
topics and subscribe/unsubscribe synchronization messages.
3. A PUB socket on a weak host machine (e.g. AWS EC2 t2.medium),
sometimes aborts. I'm not sure what is the point.
Unfortunately, I couldn't reproduce the last 2 scenarios by a small code.
But my server still has been aborted.
The assertion failure occurs when a PUB socket tries to remove a pipe to a
SUB socket but there's no matched pipe. I'm wondering if ZeroMQ guarantees
the consistency of subscribe/unsubscribe synchronizations between busy PUB
and SUB sockets.
Regards,
Heungsub
_______________________________________________
zeromq-dev mailing list
https://lists.zeromq.org/mailman/listinfo/zeromq-dev
Loading...