Scaling to 500 with JackTrip

As our choral practice groups have been growing in size, I’ve been keeping a closer eye on scalability. Today, I was able to complete a (synthetic) test that simulated a performance involving 254 500 live participants, all connected via a single JackTrip server! This article is for the larger groups out there, about how to scale.

Any peer-to-peer based solution (JamKazaam, SoundJack, etc.) has inherent architectural limitations that will prevent it from scaling beyond several participants. I’ll save a more detailed explanation for another article, on another day, and focus instead on the two most promising client-server based solutions: Jamulus and JackTrip.

Jamulus uses a single “worker” thread to do most of its work. I’m not familiar with the reasons for this design, but generally it is much more difficult to build software that runs efficiently across multiple threads. The significant downside is that Jamulus fails to utilize the many cores that are available in any modern CPU.

Jamulus currently has a hard-coded maximum of 50 “channels,” where each channel is equivalent to one participant. This is for good reason, because after about 30 or 40 channels, you will max out the capacity of its worker thread. At best, this will lead to a very bad audio experience, and at worse a crash. It doesn’t matter how beefy your server is. In its current form, Jamulus is simply incapable of surpassing this limit.

JackTrip is multi-threaded, and very efficient at utilizing CPU resources. It scaled linearly for my test, using about 8 cores for every 100 clients. Even at 400 clients, memory usage was extremely low and there were no errors in the jacktrip or jackd process logs. However, the journey of scaling was not without pitfalls.

The first bottleneck you will likely encounter is hard-coded into JackTrip: it does not allow more client connections than vCPUs available. There is a patch currently available in the “chris” branch that removes this.

The next bottleneck I hit was a compile-time limit in the jackd server used by JackTrip. This sets the maximum number of clients, and the default is 62. You can read more about it here. Thankfully, it’s not hard to build a custom jackd from source to increase this limit to 512 (see below).

The next limit I hit was at 100. From the jackd logs:

client 54.177.200.14-99 has 99 extra instances already
Cannot read socket fd = 308 err = No such file or directory
Unknown request 0

My first attempts used a c5.24xlarge (96 vCPU) EC2 instance to run the JackTrip server, and a separate c5.24xlarge instance to run this jacktrip_load.py script. I learned that when Jack assigns names to clients, it uses a two-digit format, making “99” the max. This was a problem specific to my synthetic test. My subsequent attempts worked around it by using multiple c5.9xlarge (36 vCPU) instances to run jacktrip_load.py, each running up to 100 clients.

After this, I hit another limit at around 144 clients: jackd ran out of file handles and crashed. I’m using Linux for these tests, so it was easy enough to fix with a few extra limits.conf lines (also bumping number of processes to be safe):

@audio soft nproc 200000
@audio hard nproc 200000
@audio soft nofile 200000
@audio hard nofile 200000

At 254 clients, I hit another jackd limit that was causing JackTrip to crash with the following output:

Waiting for Peer…
JackTrip HUB SERVER: Total Running Threads: 254
===============================================================
Received Connection from Peer!
spawning jacktripWorker so change patch
Cannot open lsp client
Cannot read socket fd = 1788 err = Success
CheckRes error
JackSocketClientChannel read fail
JackShmReadWritePtr1::~JackShmReadWritePtr1 - Init not done for -1, skipping unlock
jack_client_open() failed, status = 0x%2.0x
33

jackd was logging numerous errors, but they started out with this:

shm registry full
Cannot create shared memory segment of size = 426
JackShmMem::new bad alloc
Cannot open client
Cannot create new client
CheckSize error size = 0 Size() = 12
CheckRead error
CheckSize error size = 3 Size() = 12
CheckRead error
Unknown request 0
Unknown request 4294967295

Some digging in the jackd source code led me to the MAX_SHM_ID constant in shm.h, which is set by default to 256. You can increase this and the number of clients using the following patch:

diff --git a/common/JackConstants.h b/common/JackConstants.h
index cae54566..28c7d5d9 100644
--- a/common/JackConstants.h
+++ b/common/JackConstants.h
@@ -62,7 +62,7 @@
 #define CONNECTION_NUM_FOR_PORT PORT_NUM_FOR_CLIENT

 #ifndef CLIENT_NUM
-#define CLIENT_NUM 64
+#define CLIENT_NUM 1024
 #endif

 #define AUDIO_DRIVER_REFNUM   0                 // Audio driver is initialized first, it will get the refnum 0
diff --git a/common/shm.h b/common/shm.h
index 5326fddb..5bd82278 100644
--- a/common/shm.h
+++ b/common/shm.h
@@ -49,7 +49,7 @@ extern "C"
 #endif

 #define MAX_SERVERS 8               /* maximum concurrent servers */
-#define MAX_SHM_ID 256              /* generally about 16 per server */
+#define MAX_SHM_ID 1024              /* generally about 16 per server */
 #define JACK_SHM_MAGIC 0x4a41434b      /* shm magic number: "JACK" */
 #define JACK_SHM_NULL_INDEX -1         /* NULL SHM index */
 #define JACK_SHM_REGISTRY_INDEX -2     /* pseudo SHM index for registry */

Here are the steps to build a custom jackd from source that includes these changes:

git clone https://github.com/jackaudio/jack2.git
cd jack2
patch -p1 < PATH_TO_FILE/jack_limits.patch
./waf configure
./waf
./waf install

All of these changes enabled me to scale my JackTrip server up to handle 500 clients:

------------------------------------------------------------
UDP Socket Receiving in Port: 61499
------------------------------------------------------------
Waiting for Peer…
Received Connection from Peer!
JackTrip HUB SERVER: Total Running Threads: 500
============================================================
spawning jacktripWorker so change patch
JackTrip HUB SERVER: Waiting for client connections…
JackTrip HUB SERVER: Hub auto audio patch setting = 0
============================================================

At this point, I was reaching the ceiling for another worker thread that appears inside of jackd:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11694 ubuntu -11 0 175268 102424 92532 R 95.1 0.1 18:11.67 jackd
11942 ubuntu 20 0 60.3g 118964 96040 R 7.1 0.1 1:30.34 UdpDataProtocol
12117 ubuntu 20 0 60.3g 118964 96040 S 7.1 0.1 1:27.72 UdpDataProtocol
12207 ubuntu 20 0 60.3g 118964 96040 S 7.1 0.1 1:26.54 UdpDataProtocol

I received no errors from JackTrip, but jackd started recording frequent Xruns and other errors. Also, the CPU utilization overall on my server was pretty heavily taxed:

load average: 64.61, 69.02, 55.29

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11696 ubuntu 20 0 60.3g 118308 96040 S 4297 0.1 721:01.47 jacktrip
11692 ubuntu 20 0 175268 102424 92532 S 97.4 0.1 18:47.87 jackd

On some of my test runs, the 500th client even failed to connect. I would consider this to be the upper bound; trying to push it any further would be unrealistically time consuming.

For comparison, here is how the resources looked with 400 clients:

load average: 36.83, 34.12, 33.19

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11696 ubuntu 20 0 57.9g 108280 95636 S 2939 0.1 171:40.36 jacktrip
11692 ubuntu 20 0 174476 101600 91740 S 72.5 0.1 5:17.36 jackd

With 400 clients, the jackd worker thread was very busy, but still had some room to go:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11694 ubuntu -11 0 174476 101632 91740 R 75.9 0.1 19:15.28 jackd
13981 ubuntu 20 0 57.9g 125236 95644 S 7.2 0.1 1:13.20 UdpDataProtocol
14059 ubuntu 20 0 57.9g 125236 95644 S 7.2 0.1 1:12.51 UdpDataProtocol

I expect that with appropriate tuning, a single 96 vCPU JackTrip server can successfully handle up to around 400 live participants. Beyond that, you are pushing the capabilities of both jackd and the system itself. This is order of magnitude more than what Jamulus can handle, and more than enough even for large musical ensembles.

12 thoughts on “Scaling to 500 with JackTrip

  1. Mike, this is super interesting and very exciting news. So great that you’ve been able to scale so far!

    David

  2. So you’ve spun up a few hundred nodes but that is just talking about internal resource efficiencies on the platform. What are you doing about inserting a random lag value to packets to reflect real time use? Not everybody is going to be patched directly into this server correct?

    1. You are correct that this would be just a start of more comprehensive scale testing. It’s good enough for me to validate that JackTrip is the only viable path forward for large groups right now. I will likely be doing more real life scale testing before I have any more time to invest into building more realistic synthetic testing. But the script is open source, and I welcome anyone to further enhance and refine it from here.

  3. Thanks for all your great work on this, Mike! I’m curious to see how the large scale ensemble options develop.

  4. This is really great information Mike, thank you for sharing this! I have been testing peer to peer connections and routing that audio into Ableton Live to be played with using all the FX I have in there. I’m interested in understanding how to bet control a hub server setup and I’m having a few hangups, I have been doing research but I’m having trouble troubleshooting how to patch the audio. I feel like I’m missing something basic, do you have any recommended literature that you think would be helpful?

    I’m working on a live remote orchestra piece for a university and I’ve figured out a great way to work with the latency people are experiencing that includes using that somewhat calculated delay musically and I’m interested in getting to a more comfortable place with Jacktrip to use with those who find it to be manageable. Thanks so much for your research and testing, I find this is really amazing to see!

    1. Hi Trevor, the ability to patch JackTrip into DAWs etc. on the server is one of its biggest strengths. Doing this in a scalable and automated way (versus manually) is an area of ongoing work. Chris just this week I believe added a -p6 option to hubpatch mode to add some more capabilities along these lines (plus auto-panning). But I still believe there is much work yet to be done.

      In particular, one thing I learned since posting this article is that the current hubpatch implementation will not scale past around 50 clients. The original tests I ran did not use hubpatch. On Virtual Studio servers, we’ve migrated to using a branch of the jack_autoconnect project to manage patching into SuperCollider (which now handles all mixing and other audio processing). You can find some more details about that in this article. I’ve verified that this approach scales well all the way up to about 480 clients using a 96 vCPU server.

  5. Great article and great work Mike! I’d like to try out the patch but it seems the link is broken. Please where can it be downloaded?

  6. Great post. I have just setup a Jacktrip hub server and I suspect I will go through the same pains, it is good to have a look at what’s coming ahead. Reading through the comments left me with a few doubts:

    – You referred that JackTrip had an inbuilt limitation of one client connection per vCPU. I tested a few days ago a two client scenario with just one vCPU and did not trip on that limitation. Is this because the limitation has since been removed from the current release? Or perhaps it does not apply to Hub server mode?

    – You mentioned initially that in your tests, Jacktrip scaled out linearly at a rate of about 100 clients per 8 cores, which gives a rate of about 12 participants per core. However, at the end and in a comment, you mention an estimate of 96 vCPUs for up to 480 clients, which gives a lower rate of 5 participants per core. What is the source of the discrepancy? Is is due to differences in audio processing in the two scenarios?

    – How many channels are you sending per participant? Stereo or mono?

    A final question for your reflection: by the exposition above and my limited knowledge, it seems a possible scaling approach would be to just run multiple jacktrip servers that then are hooked together (to each jacktrip server, the other servers are just “special” clients that happen to send a very rich mix). Do you agree? (Latency considerations notwithstanding).

    1. I used two channels (stereo) per client for my tests.

      1 vCPU = 1/2 core, so the best guideline to follow is 1 vCPU per 5 participants, to leave a little bit of headroom. I’ve found this holds pretty true for most circumstances, but there are other variables like some CPUs are better than others (e.g. I’ve observed much worse performance using AMD Ryzen chips), and different settings can have a big impact (e.g. fewer samples or frames per period requires more CPU).

      The limitation of 1 client per vCPU has been resolved, I think prior to the 1.2 release (latest is 1.3).

      I believe you are correct, that chaining together multiple JackTrip servers, should in theory allow it to scale as big as you have hardware to support. I imagine you’d have to be careful managing the levels on those inter-server channels! 😉

      1. Thank you for your response! It helps me to estimate possible future costs/instance sizes for our case. We will be using mono, so I expect we might sustain more clients per core. I have set this up on AWS Graviton 2 ARM instances since the start (due to lower costs), where I believe 1vCPU is actually 1 physical core, so if we suceed and go beyond our tests with 2-3 participants, I will report what we find about scaling on these type of instances. As of now, on our limited tests (on a C6g instance), 1-core utilization seems optimistically low!

Leave a Reply to fingerboardmap Cancel reply