Experiment to achieve 5M persistent connections with Project Loom (Java)

Experiment to achieve 5M persistent connections with Project Loom (Java)

I be nuts about add-ons, because they are clever!

Project Loom C5M is an experiment to achieve 5 million persistent connections each in client and server
Java applications using OpenJDK Project Loom
virtual threads.

The C5M name is inspired by the C10k problem proposed in 1999.

The project consists of two simple components, EchoServer and EchoClient.

EchoServer creates many TCP passive server sockets, accepting new connections on each as they come in.
For each active socket created, EchoServer receives bytes in and echoes them back out.

EchoClient initiates many outgoing TCP connections to a range of ports on a single destination server.
For each socket created, EchoClient sends a message to the server, awaits the response, and goes to sleep
for a time before sending again.

EchoClient terminates immediately if any of the follow occurs:

Connection timeout
Socket read timeout
Integrity error with message received
TCP connection closure
TCP connection reset
Any other unexpected I/O condition

After plenty of trial and error, I arrived at the following set of Linux kernel parameter changes
to support the target socket scale.

The following article about achieving 1 million persistent connections with Erlang was very helpful in this regard:
https://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-1

/etc/sysctl.conf:

fs.file-max=10485760
fs.nr_open=10485760

net.core.somaxconn=16192
net.core.netdev_max_backlog=16192
net.ipv4.tcp_max_syn_backlog=16192

net.ipv4.ip_local_port_range=1024 65535

net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.core.rmem_default=16777216
net.core.wmem_default=16777216
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 87380 16777216
net.ipv4.tcp_mem=1638400 1638400 1638400

net.netfilter.nf_conntrack_buckets=1966050
net.netfilter.nf_conntrack_max=7864200

#EC2 Amazon Linux
#net.core.netdev_max_backlog=65536
#net.core.optmem_max=25165824
#net.ipv4.tcp_max_tw_buckets=1440000

/etc/security/limits.conf:

* soft nofile 8000000
* hard nofile 9000000

Bare Metal
Two server-grade hosts were used for bare metal tests:

HPE ProLiant BL460c Gen10
2x 20 core CPU
128GB RAM
Xeon Gold 6230 CPUs
CentOS 7.9
Project Loom Early Access Build 19-loom+5-429 (2022/4/4) from https://jdk.java.net/loom/

The server launched with a passive server port range of [9000, 9099].

The client launched with the same server target port range and a connections-per-port count of 50,000,
for a total of 5,000,000 target connections.

I stopped the experiment after about 40 minutes of continuous operation. None of the errors, closures, or timeouts
outlined above occurred.

The experiment ran for 40 minutes. About 200,000,000 messages were echoed.

Server:

[ebarlas@dct-kvm1.bmi ~]$ ./jdk-19/bin/java –enable-preview -ea -cp project-loom-scale-1.0.0-SNAPSHOT.jar loomtest.EchoServer 0.0.0.0 9000 100 16192 16
Args[host=0.0.0.0, port=9000, portCount=100, backlog=16192, bufferSize=16]
[0] connections=0, messages=0
[1040] connections=0, messages=0

[10047] connections=765, messages=765
[11047] connections=7292, messages=7282

[2422192] connections=5000000, messages=199113472
[2423193] connections=5000000, messages=199216494
[2424195] connections=5000000, messages=199304727

Client:

[ebarlas@dct-kvm2.bmi ~]$ ./jdk-19/bin/java –enable-preview -ea -cp project-loom-scale-1.0.0-SNAPSHOT.jar loomtest.EchoClient dct-kvm1.bmi.expertcity.com 9000 100 50000 60000 60000 60000
Args[host=dct-kvm1.bmi.expertcity.com, port=9000, portCount=100, numConnections=50000, socketTimeout=60000, warmUp=60000, sleep=60000]
[0] connections=8788, messages=8767
[1013] connections=25438, messages=25438
[2014] connections=45285, messages=45279
[3014] connections=64865, messages=64863

[2410230] connections=5000000, messages=199020253
[2411230] connections=5000000, messages=199092940
[2412230] connections=5000000, messages=199188157

The ss command reflects that both client and server had 5,000,000+ sockets open.

Server:

[ebarlas@dct-kvm1.bmi ~]$ ss -s
Total: 5001570 (kernel 0)
TCP: 5000123 (estab 5000010, closed 4, orphaned 0, synrecv 0, timewait 2/0), ports 0

Transport Total IP IPv6
* 0 – –
RAW 0 0 0
UDP 15 11 4
TCP 5000119 17 5000102
INET 5000134 28 5000106
FRAG 0 0 0

Client:

[ebarlas@dct-kvm2.bmi ~]$ ss -s
Total: 5000735 (kernel 0)
TCP: 5000019 (estab 5000006, closed 5, orphaned 0, synrecv 0, timewait 4/0), ports 0

Transport Total IP IPv6
* 0 – –
RAW 0 0 0
UDP 12 8 4
TCP 5000014 13 5000001
INET 5000026 21 5000005
FRAG 0 0 0

The server Java process used 32 GB of committed resident memory and 49 GB of virtual memory.
After running for 37m37s, it used 03h39m02s of CPU time.

The client Java process used 26 GB of committed resident memory and 49 GB of virtual memory.
After running for 37m12s, it used 06h16m30s of CPU time.

Server:

COMMAND %CPU TIME ELAPSED PID %MEM RSS VSZ
./jdk-19/bin/java –enable-preview -ea -cp project-loom-scale-1.0.0-SNAPSHOT.jar loomtest.EchoServer 582 03:39:02 37:37 35102 24.1 31806188 48537940

Client:

COMMAND %CPU TIME ELAPSED PID %MEM RSS VSZ
./jdk-19/bin/java –enable-preview -ea -cp project-loom-scale-1.0.0-SNAPSHOT.jar loomtest.EchoClient 1012 06:16:30 37:12 19339 20.0 26370892 48553472

A dump of the Java platform threads confirmed expectations. 220 total threads and 80 carrier threads.

As outlined in JEP 425, the number of carrier threads defaults
to available processors.

The number of processors reports by the host is 80. 2 CPUs, 20 cores each, hyper-threading (2 threads each).

$ grep ^processor /proc/cpuinfo | wc -l
80
$ ./jdk-19/bin/jstack 28226> threads.txt
$ grep tid=threads.txt | wc -l
220
$ grep tid=threads.txt | grep ForkJoin | wc -l
80
EC2
Two EC2 instances were used for cloud tests:

c5.2xlarge
16GB RAM
8 vCPU
Amazon Linux 2 with Linux Kernel 5.10, AMI ami-00f7e5c52c0f43726
Project Loom Early Access Build 19-loom+5-429 (2022/4/4) from https://jdk.java.net/loom/

The server launched with a passive server port range of [9000, 9049].

The client launched with the same server target port range and a connections-per-port count of 10,000,
for a total of 500,000 target connections.

I stopped the experiment after about 35 minutes of continuous operation. None of the errors, closures, or timeouts
outlined above occurred.

The experiment ran for 35 minutes. About 17,500,000 messages were echoed.

[ec2-user@ip-10-39-197-143 ~]$ ./jdk-19/bin/java –enable-preview -ea -cp project-loom-scale-1.0.0-SNAPSHOT.jar loomtest.EchoServer 0.0.0.0 9000 50 16192 16
Args[host=0.0.0.0, port=9000, portCount=50, backlog=16192, bufferSize=16]
[0] connections=0, messages=0
[1015] connections=0, messages=0

[17020] connections=2412, messages=2412
[18020] connections=10317, messages=10316

[2106644] connections=500000, messages=17414982
[2107645] connections=500000, messages=17423544

[ec2-user@ip-10-39-196-215 ~]$ ./jdk-19/bin/java –enable-preview -ea -cp project-loom-scale-1.0.0-SNAPSHOT.jar loomtest.EchoClient 10.39.197.143 9000 50 10000 30000 60000 60000
Args[host=10.39.197.143, port=9000, portCount=50, numConnections=10000, socketTimeout=30000, warmUp=60000, sleep=60000]
[0] connections=131, messages=114
[1014] connections=4949, messages=4949
[2014] connections=13248, messages=13246

[2091751] connections=500000, messages=17428105
[2092751] connections=500000, messages=17432046

The ss command reflects that both client and server had 500,000+ sockets open.

Server:

[ec2-user@ip-10-39-197-143 ~]$ ss -s
Total: 500233 (kernel 0)
TCP: 500057 (estab 500002, closed 0, orphaned 0, synrecv 0, timewait 0/0), ports 0

Transport Total IP IPv6
* 0 – –
RAW 0 0 0
UDP 8 4 4
TCP 500057 5 500052
INET 500065 9 500056
FRAG 0 0 0

Client:

[ec2-user@ip-10-39-196-215 ~]$ ss -s
Total: 500183 (kernel 0)
TCP: 500007 (estab 500002, closed 0, orphaned 0, synrecv 0, timewait 0/0), ports 0

Transport Total IP IPv6
* 0 – –
RAW 0 0 0
UDP 8 4 4
TCP 500007 5 500002
INET 500015 9 500006
FRAG 0 0 0

The server Java process used 2.3 GB of committed resident memory and 8.4 GB of virtual memory.
After running for 35.12m, it used 14m42s of CPU time.

The client Java process used 2.8 GB of committed resident memory and 8.9 GB of virtual memory.
After running for 34.88m, it used 25m19s of CPU time.

Server:

COMMAND %CPU TIME PID %MEM RSS VSZ
./jdk-19/bin/java –enable-preview -ea -cp project-loom-scale-1.0.0-SNAPSHOT.jar loomtest.EchoServer 36.7 00:12:42 18432 14.5 2317180 8434320

Client:

COMMAND %CPU TIME PID %MEM RSS VSZ
./jdk-19/bin/java –enable-preview -ea -cp project-loom-scale-1.0.0-SNAPSHOT.jar loomtest.EchoClient 73.9 00:25:19 18120 17.9 2848356 8901320

Read More
Share this on knowasiak.com to discuss with people on this topicSign up on Knowasiak.com now if you’re not registered yet.

Related Articles

Create your crypto business with Stripe

The crypto ecosystem and its regulatory outlook continue to evolve rapidly, and our feature availability varies by region and use case. Please see our crypto supportability page for more details on our current product availability. Fill out the form to tell us more about what you’re building so we can better understand how to support…

Stripe Crypto

The crypto ecosystem and its regulatory outlook continue to evolve rapidly, and our feature availability varies by region and use case. Please see our crypto supportability page for more details on our current product availability. Fill out the form to tell us more about what you’re building so we can better understand how to support…

Windows 11 Guide

A guide on setting up your Windows 11 Desktop with all the essential Applications, Tools, and Games to make your experience with Windows 11 great! Note: You can easily convert this markdown file to a PDF in VSCode using this handy extension Markdown PDF. Getting Started Windows 11 Desktop Bypass Windows 11’s TPM, CPU and…

x86 Is an Octal Machine

# source:http://reocities.com/SiliconValley/heights/7052/opcode.txt From: mark@omnifest.uwm.edu (Mark Hopkins) Newsgroups: alt.lang.asm Subject: A Summary of the 80486 Opcodes and Instructions (1) The 80×86 is an Octal Machine This is a follow-up and revision of an article posted in alt.lang.asm on 7-5-92 concerning the 80×86 instruction encoding. The only proper way to understand 80×86 coding is to realize that…

Responses

Your email address will not be published. Required fields are marked *