GKE Ingress with Cloud CDN is slower than you think
Found out when doing API performance benchmark
Cover image taken from Unsplash
TL;DR, use container-native load balancing for GKE Ingress without enable Cloud CDN
The original post title might sounds misleading, if you have seen it somewhere (SEO). Originally I thought the performance issue was caused by GKE Ingress. After some discussions and investigations, I found out that it only happened when enable Cloud CDN. Therefore I updated the post content, and the post title
But the benchmark results show GKE Ingress still slightly slower than others, might come out with a simple benchmark test against popular Ingresses in the future.
I was doing benchmark for my API improvement blog series, and notice that APIs that were served from GKE Ingress with Cloud CDN enabled that is not cacheable has the worst performance throughout the tests. And I rerun all the benchmark again. The workload is running on GKE autopilot.
API server source code is available at cncf-demo/cache-server
Here is the implementation of the API, wait for 200ms and return response.
app.Get("/", func(c *fiber.Ctx) error {
// simulate database call
// for 200ms
time.Sleep(200 * time.Millisecond)
return c.JSON(fiber.Map{"result": "ok"})
})
Here are the benchmark results, access through public IP address, no DNS, no HTTPS, not cacheable by CDN and no server-side caching
Note that I am running the test on local terminal, so the factor of internet speed is ignored. Benchmark is tested with bombardier
With GKE Ingress (standard)
GKE Ingress (container-native load balancing) with Cloud CDN
I mistakenly thought it was caused by GKE Ingress (standard), but actually it was caused by Cloud CDN. See section below
$ bombardier -c 200 -n 100000 IP_ADDRESS
Bombarding IP_ADDRESS with 100000 request(s) using 200 connection(s)
100000 / 100000 [=====] 100.00% 197/s 8m25s
Done!
Statistics Avg Stdev Max
Reqs/sec 206.84 622.51 7707.32
Latency 1.01s 410.20ms 5.48s
HTTP codes:
1xx - 0, 2xx - 100000, 3xx - 0, 4xx - 0, 5xx - 0
others - 0
Throughput: 52.75KB/s
With GKE Ingress (container-native load balancing) without Cloud CDN
$ bombardier -c 200 -n 100000 IP_ADDRESS
Bombarding IP_ADDRESS with 100000 request(s) using 200 connection(s)
100000 / 100000 [=====] 100.00% 885/s 1m52s
Done!
Statistics Avg Stdev Max
Reqs/sec 899.61 760.42 10838.69
Latency 225.16ms 17.68ms 533.07ms
HTTP codes:
1xx - 0, 2xx - 100000, 3xx - 0, 4xx - 0, 5xx - 0
others - 0
Throughput: 197.52KB/s
With service type Load Balancer
$ bombardier -c 200 -n 100000 IP_ADDRESS
Bombarding IP_ADDRESS/ with 100000 request(s) using 200 connection(s)
100000 / 100000 [=====] 100.00% 911/s 1m49s
Done!
Statistics Avg Stdev Max
Reqs/sec 922.00 827.36 5866.92
Latency 218.86ms 9.84ms 369.95ms
HTTP codes:
1xx - 0, 2xx - 100000, 3xx - 0, 4xx - 0, 5xx - 0
others - 0
Throughput: 191.66KB/s
With Traefik Ingress, with single replica
$ bombardier -c 200 -n 100000 IP_ADDRESS
Bombarding IP_ADDRESS/cache with 100000 request(s) using 200 connection(s)
100000 / 100000 [=====] 100.00% 901/s 1m50s
Done!
Statistics Avg Stdev Max
Reqs/sec 912.27 747.73 11331.52
Latency 221.33ms 17.28ms 492.15ms
HTTP codes:
1xx - 0, 2xx - 100000, 3xx - 0, 4xx - 0, 5xx - 0
others - 0
Throughput: 200.09KB/s
Running on localhost, go run .
$ bombardier -c 200 -n 100000 localhost:3000
Bombarding http://localhost:3000 with 100000 request(s) using 200 connection(s)
100000 / 100000 [=====] 100.00% 991/s 1m40s
Done!
Statistics Avg Stdev Max
Reqs/sec 1015.70 2803.63 13085.14
Latency 201.35ms 0.98ms 218.76ms
HTTP codes:
1xx - 0, 2xx - 100000, 3xx - 0, 4xx - 0, 5xx - 0
others - 0
Throughput: 203.70KB/s
Overview of the results
Type | Average RPS | Average Latency | Time taken |
GKE Ingress (with CDN) | 206.84 | 1.01s | 8m25s |
GKE Ingress (without CDN) | 899.61 | 225.16ms | 1m52s |
service type Load Balancer | 922.00 | 218.86ms | 1m49s |
Traefik | 912.27 | 221.33ms | 1m50s |
localhost | 1015.70 | 201.35ms | 1m40s |
Based on the results above, service type Load Balancer
and Traefik
has similar performance, close to localhost
, but GKE Ingress (with CDN) perform about 4-5X worst than the rest. However, GKE ingress without CDN perform much better than with CDN.
Tested enabled Cloud CDN from backend config CRD or from Cloud console
I verified the service has the following annotation (added automatically), to confirmed it was a container-native load balancing
cloud.google.com/neg: '{"ingress":true}'
Not sure what is the cause at the moment. File an issue on Github, hope this can get some attention.