Hello ! Thank you for posting on Microsoft Learn Q&A.
In your case you need to verify if it’s real RSS growth and not page cache or host noise so try to run on each host/container:
# live view
docker stats cpaas_trans
# inside the container: process 1 is the service
docker exec cpaas_trans sh -c 'awk "/VmRSS|VmSize/" /proc/1/status; echo "---"; cat /proc/meminfo | egrep "MemAvailable|Cached"; echo "---"; head -100 /proc/1/smaps_rollup'
# per-region allocator view (good for C/C++ heaps)
docker exec cpaas_trans sh -c 'cat /proc/1/smaps_rollup'
and if VmRSS tracks your Grafana line while Cached on the host does not it’s true heap growth.
Another thing is to rule out easy the followings :
- cgroups limits: you set --memory 46g on a 48 GB host. Leave at least 6–8 GB headroom for the OS + page cache
(--memory-swap 0 = no swap; prevents silent ballooning.)docker run ... --memory 38g --memory-swap 0 --memory-reservation 30g ... - logging: you’re using --log-driver=syslog; confirm host syslog rotation so logs don’t pressure memory/disk I/O
- concurrency that correlate with mini-steps up in RSS
You need to lower limits on one node: --memory 38g --memory-swap 0 --cpus 16.
and spin up one canary with -e Languages=en,pt only and mirror10% traffic for 48 h.
Add an hourly dump of /proc/1/smaps_rollup to quantify where memory grows and on clients, cap HTTP keep-alive lifetime and idle pools and try a newer tag on a separate canary.