GKE Ingress with Cloud CDN is slower than you think

GKE Ingress with Cloud CDN is slower than you think

Found out when doing API performance benchmark

·

4 min read

Cover image taken from Unsplash

TL;DR, use container-native load balancing for GKE Ingress without enable Cloud CDN


The original post title might sounds misleading, if you have seen it somewhere (SEO). Originally I thought the performance issue was caused by GKE Ingress. After some discussions and investigations, I found out that it only happened when enable Cloud CDN. Therefore I updated the post content, and the post title

But the benchmark results show GKE Ingress still slightly slower than others, might come out with a simple benchmark test against popular Ingresses in the future.


I was doing benchmark for my API improvement blog series, and notice that APIs that were served from GKE Ingress with Cloud CDN enabled that is not cacheable has the worst performance throughout the tests. And I rerun all the benchmark again. The workload is running on GKE autopilot.

API server source code is available at cncf-demo/cache-server

Here is the implementation of the API, wait for 200ms and return response.

app.Get("/", func(c *fiber.Ctx) error {
    // simulate database call
    // for 200ms
    time.Sleep(200 * time.Millisecond)
    return c.JSON(fiber.Map{"result": "ok"})
})

Here are the benchmark results, access through public IP address, no DNS, no HTTPS, not cacheable by CDN and no server-side caching

Note that I am running the test on local terminal, so the factor of internet speed is ignored. Benchmark is tested with bombardier

With GKE Ingress (standard) GKE Ingress (container-native load balancing) with Cloud CDN

I mistakenly thought it was caused by GKE Ingress (standard), but actually it was caused by Cloud CDN. See section below

$ bombardier -c 200 -n 100000 IP_ADDRESS

Bombarding IP_ADDRESS with 100000 request(s) using 200 connection(s)
 100000 / 100000 [=====] 100.00% 197/s 8m25s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec       206.84     622.51    7707.32
  Latency         1.01s   410.20ms      5.48s
  HTTP codes:
    1xx - 0, 2xx - 100000, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:    52.75KB/s

With GKE Ingress (container-native load balancing) without Cloud CDN

$ bombardier -c 200 -n 100000 IP_ADDRESS

Bombarding IP_ADDRESS with 100000 request(s) using 200 connection(s)
 100000 / 100000 [=====] 100.00% 885/s 1m52s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec       899.61     760.42   10838.69
  Latency      225.16ms    17.68ms   533.07ms
  HTTP codes:
    1xx - 0, 2xx - 100000, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:   197.52KB/s

With service type Load Balancer

$ bombardier -c 200 -n 100000 IP_ADDRESS

 Bombarding IP_ADDRESS/ with 100000 request(s) using 200 connection(s)
 100000 / 100000 [=====] 100.00% 911/s 1m49s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec       922.00     827.36    5866.92
  Latency      218.86ms     9.84ms   369.95ms
  HTTP codes:
    1xx - 0, 2xx - 100000, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:   191.66KB/s

With Traefik Ingress, with single replica

$ bombardier -c 200 -n 100000 IP_ADDRESS

Bombarding IP_ADDRESS/cache with 100000 request(s) using 200 connection(s)
 100000 / 100000 [=====] 100.00% 901/s 1m50s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec       912.27     747.73   11331.52
  Latency      221.33ms    17.28ms   492.15ms
  HTTP codes:
    1xx - 0, 2xx - 100000, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:   200.09KB/s

Running on localhost, go run .

$ bombardier -c 200 -n 100000 localhost:3000

Bombarding http://localhost:3000 with 100000 request(s) using 200 connection(s)
 100000 / 100000 [=====] 100.00% 991/s 1m40s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec      1015.70    2803.63   13085.14
  Latency      201.35ms     0.98ms   218.76ms
  HTTP codes:
    1xx - 0, 2xx - 100000, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:   203.70KB/s

Overview of the results

TypeAverage RPSAverage LatencyTime taken
GKE Ingress (with CDN)206.841.01s8m25s
GKE Ingress (without CDN)899.61225.16ms1m52s
service type Load Balancer922.00218.86ms1m49s
Traefik912.27221.33ms1m50s
localhost1015.70201.35ms1m40s

Based on the results above, service type Load Balancer and Traefik has similar performance, close to localhost, but GKE Ingress (with CDN) perform about 4-5X worst than the rest. However, GKE ingress without CDN perform much better than with CDN.

Tested enabled Cloud CDN from backend config CRD or from Cloud console

I verified the service has the following annotation (added automatically), to confirmed it was a container-native load balancing

cloud.google.com/neg: '{"ingress":true}'

Not sure what is the cause at the moment. File an issue on Github, hope this can get some attention.

Did you find this article valuable?

Support Wei Lun by becoming a sponsor. Any amount is appreciated!