Just an update. Have upgraded to 2.0.9, and problem still exists. Trying
to get my hands around the zmq architecture so I can figure out what I'm
doing wrong.
I'm assuming the io_threads are responsible for pulling the data off the tcp
socket and are working properly as a netstat shows no data in the receive
queue. When a process enters this state (not sure what triggers it but it
will happen once or twice through trading day) memory continues to grow so
my assumption is the handoff from the io_threads to my thread calling
socket->recv() is breaking down. I'm going to keep digging, hopefully
someone can give me some pointers before I become a zeromq internals expert.
threads & stack trace shown below.
Thanks
Marc
(gdb) info threads
12 Thread 0x7f0f69118710 (LWP 27483) 0x0000003290e0e53c in recv () from
/lib64/libpthread.so.0
11 Thread 0x7f0f63fff710 (LWP 27484) 0x00000032902ded73 in epoll_wait ()
from /lib64/libc.so.6
10 Thread 0x7f0f635fe710 (LWP 27485) 0x00000032902ded73 in epoll_wait ()
from /lib64/libc.so.6
9 Thread 0x7f0f62bfd710 (LWP 27486) 0x00000032902ded73 in epoll_wait ()
from /lib64/libc.so.6
8 Thread 0x7f0f621fc710 (LWP 27487) 0x00000032902ded73 in epoll_wait ()
from /lib64/libc.so.6
7 Thread 0x7f0f617fb710 (LWP 27488) 0x0000003290e0b04c in
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
6 Thread 0x7f0f60dfa710 (LWP 27489) 0x0000003290e0b04c in
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
5 Thread 0x7f0f5bfff710 (LWP 27490) 0x0000003290e0b04c in
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
4 Thread 0x7f0f5b5fe710 (LWP 27491) 0x0000003290e0b04c in
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
3 Thread 0x7f0f5abfd710 (LWP 27495) 0x00000032902d7553 in select () from
/lib64/libc.so.6
2 Thread 0x7f0f597fb710 (LWP 27497) 0x0000003290e0e43d in accept () from
/lib64/libpthread.so.0
* 1 Thread 0x7f0f6993a820 (LWP 27482) 0x00000032902a4d5d in nanosleep ()
from /lib64/libc.so.6
(gdb) thread 12
[Switching to thread 12 (Thread 0x7f0f69118710 (LWP 27483))]#0
0x0000003290e0e53c in recv () from /lib64/libpthread.so.0
(gdb) where
#0 0x0000003290e0e53c in recv () from /lib64/libpthread.so.0
#1 0x00007f0f6a3f8af6 in zmq::signaler_t::recv (this=0x7f0f640023b0,
cmd_=0x7f0f69117a50, block_=true) at signaler.cpp:274
#2 0x00007f0f6a3ea724 in zmq::app_thread_t::process_commands
(this=0x7f0f64002380, block_=<value optimized out>, throttle_=<value
optimized out>) at app_thread.cpp:88
#3 0x00007f0f6a3f903c in zmq::socket_base_t::recv (this=0x7f0f640023f0,
msg_=0x7f0f69117bc0, flags_=0) at socket_base.cpp:443
#4 0x000000000042ff0f in zmq::socket_t::recv (this=0x7f0f69117c70,
msg_=0x7f0f69117bc0, flags_=0) at /usr/local/include/zmq.hpp:256
#5 0x000000000042dbcc in Altair::MktDataSub::Consumer () at
MktDataSub_gpb.cpp:110
#6 0x00000000004353b5 in boost::detail::thread_data<void (*)()>::run
(this=0xaeb810) at /usr/include/boost/thread/detail/thread.hpp:56
#7 0x00007f0f6b5a4670 in thread_proxy () from
/usr/lib64/libboost_thread-mt.so.5
#8 0x0000003290e06a3a in start_thread () from /lib64/libpthread.so.0
#9 0x00000032902de77d in clone () from /lib64/libc.so.6
#10 0x0000000000000000 in ?? ()
Post by Pieter HintjensHi Marc,
I'd advise you upgrade to 2.0.8 (stable), there have been a number of
bug fixes since 2.0.7.
I don't immediately see this issue in the changelog but it's worth
using the latest stable release in any case.
-Pieter
Post by Marc RossiHi Pieter,
0MQ version 2.0.7
Linux (Fedora Core 12 -- kernel 2.6.32.14-127.fc123.x86_64)
2 Quad core Xeon processors.
C++
TCP transport
Quick description of process. Publisher receives data from a 3rd party
stock market feed, and makes it available on address 'tcp://*:5555'.
I'll
Post by Marc Rossisee what I can do about a simplified test case without any of the
dependencies in my environment.
Thanks,
Marc
Post by Pieter HintjensHi Marc,
* the version of 0MQ you're using
* your operating system and harware
* the programming language
* the 0MQ transports you're using
* and provide a minimal test case that reproduces the problem
It'll be easier to see what's going on.
-Pieter
Post by Marc RossiHi all. Have a pub/sub setup with a single publisher and several
subscribers with default subscriptions (""). Fairly high volume of messages
are published and several times throughout the day I'll have to
restart
Post by Marc RossiPost by Pieter HintjensPost by Marc Rossia
client due to it no longer receiving updates and it's memory footprint
growing rapidly.
Have been attaching to the hung clients with gdb and have found the
same
Post by Marc RossiPost by Pieter HintjensPost by Marc Rossibasic thing everytime: 4 threads in the zmq::epoll_t::loop() func
(epoll.cpp line 161) and one thread in zmq::signaler_t::recv()
(signaler.cpp line 263) that never returns from the ::recv() call.
Usually when one of the clients enter this state the others keep on going
just fine, I'm assuming the memory footprint growth is due to messages from
the publisher being queued up.
I read in the docs about different sockets and their "exception"
states
Post by Marc RossiPost by Pieter HintjensPost by Marc Rossithat
would cause them to block until the issue is resolved, but I didn't
see
Post by Marc RossiPost by Pieter HintjensPost by Marc Rossianything in there about SUB sockets entering an exception state.
Any help / ideas to look into would be greatly appreciated.
Marc
_______________________________________________
zeromq-dev mailing list
http://lists.zeromq.org/mailman/listinfo/zeromq-dev
--
-
Pieter Hintjens
iMatix - www.imatix.com
_______________________________________________
zeromq-dev mailing list
http://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
http://lists.zeromq.org/mailman/listinfo/zeromq-dev
--
-
Pieter Hintjens
iMatix - www.imatix.com
_______________________________________________
zeromq-dev mailing list
http://lists.zeromq.org/mailman/listinfo/zeromq-dev