Discussion:
[zeromq-dev] Windows multicast
Robert Falcone
2011-04-23 19:17:03 UTC
Permalink
I am developing with MSVC++ 2010. I have read that it is tricky to get multicast (with openPGM) working with ZeroMQ on windows. I will be sending out market data at high rates and would love to test it out.

Does anyone know if it is possible to use ZeroMQ to multicast?
If so can you please provide information on how to get it to work?

Thanks,
Rob Falcone
Steven McCoy
2011-04-24 02:31:29 UTC
Permalink
Post by Robert Falcone
I am developing with MSVC++ 2010. I have read that it is tricky to get
multicast (with openPGM) working with ZeroMQ on windows. I will be sending
out market data at high rates and would love to test it out.
Does anyone know if it is possible to use ZeroMQ to multicast?
If so can you please provide information on how to get it to work?
Grab an OpenPGM package from here,
http://芋.銙枯/openpgm/<http://xn--nw2a.xn--j6w193g/openpgm/>


Then build ØMQ with ZMQ_HAVE_OPENPGM and include & lib paths set
appropriately.

I'm sure I wrote the OS configuration on the wiki somewhere, but I'm just
copying from previous mail. Be prepared for poor performance if you don't
have TX & RX coalescing enabled and actually working, check with PerfMon
that the TX counter increments. I've noted a problem with Broadcomm Server
NICs, meanwhile Linux and FreeBSD don't suffer these problems, go figure.

- Disable the multimedia network throttling scheduler. By default Windows
limits streams to 10,000 pps when a multimedia application is running.

Under
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\WindowsNT\CurrentVersion\Multimedia\SystemProfileset
NetworkThrottlingIndex, type REG_DWORD32, value set to 0xffffffff.

- Two settings define a IP stack path for datagrams, by default datagrams
above 1024 go through a slow locked double buffer, increase this to the
network MTU size, i.e. 1500 bytes.


- Two settings define a IP stack path for datagrams, by default datagrams
above 1024 go through a slow locked double buffer, increase this to the
network MTU size, i.e. 1500 bytes.

Under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\AFD\Parameters:


- FastSendDatagramThreshold, type REG_DWORD32, value set to 1500.
- FastCopyReceiveThreshold, type REG_DWORD32, value set to 1500.


- Unless you have hardware acceleration for timestamps incoming packets
will be tagged with the receipt time at expense of processing time, disable
the time stamps on both sides. This is performed by means of the following
command:

netsh int tcp set global timestamps=disabled


- A firewall will intercept all packets causing increased latency and
processing time, disable the filtering and firewall services to ensure
direct network access, disable the following services:
- Base Filtering Engine (BFE)
- Windows Firewall (MpsSvc)


You will need to reboot for the registry settings to be applied to the
system.

You can get better latency using Microsoft's ConCRT but it forces you to a
C++ main call and presumably limits the MSVC support. That's more of a DIY
customization as OpenPGM is a C project.
--
Steve-o
Steven McCoy
2011-04-26 11:07:10 UTC
Permalink
Post by Robert Falcone
I am developing with MSVC++ 2010. I have read that it is tricky to get
multicast (with openPGM) working with ZeroMQ on windows. I will be sending
out market data at high rates and would love to test it out.
Does anyone know if it is possible to use ZeroMQ to multicast?
If so can you please provide information on how to get it to work?
Grab an OpenPGM package from here, http://芋.銙枯/openpgm/<http://xn--nw2a.xn--j6w193g/openpgm/>
Then build ØMQ with ZMQ_HAVE_OPENPGM and include & lib paths set
appropriately.
I'm sure I wrote the OS configuration on the wiki somewhere, but I'm just
copying from previous mail. Be prepared for poor performance if you don't
have TX & RX coalescing enabled and actually working, check with PerfMon
that the TX counter increments. I've noted a problem with Broadcomm Server
NICs, meanwhile Linux and FreeBSD don't suffer these problems, go figure.
I ran these tests in November and published on the list, today I finally
moved over to the blog but they still are reasonably accurate, a latency
comparison of Linux and Windows.

http://openpgmdev.blogspot.com/2011/04/wherefore-art-thou-ip-packet-make-haste.html

The effects are compounded as Windows only has millisecond timers; the
performance tool rounds times up to the nearest millisecond on Windows with
a trade off of latency to busy-wait time. Busy-waiting on timers is quite
adverse to performance on systems without significant number of unbound
cores. A lot of engineering work has gone to ensure busy-waiting does not
occur on single-core or in processes with only one available core, due to
process affinity or deactivated cores.

On that note, everything Martin & Pieter says is complicated and difficult
with multi-threading and synchronisation that ØMQ avoids for performance
reasons you can probably find in OpenPGM.

:-)

For OpenPGM inside ØMQ you could theoretically disable a lot of the
threading support, the idea is to make a libpgm_se (non-re-entrant) edition.
But the savings when measured in PGM are untraceable and the overhead of
atomic operations is dwarfed by the significant memory allocator and IP
stack demands. Consider that most Ethernet adapters need buffer coalescing
to reach even gigabit levels there are still many technology limitations and
buffer bloat, as per LKML discussion, in the way aside of moving to
different fabrics.

<http://openpgmdev.blogspot.com/2011/04/wherefore-art-thou-ip-packet-make-haste.html>
--
Steve-o
Loading...