Training site that crashed 8,668 times in 3 months: 5 distinct causes found, zero downtime since
La sfida
A training site with recurring crashes: nearly 9,000 server restarts in three months, five and a half hours of total downtime, pages going unreachable with no warning for students who were mid-lesson. The team had tried patching the problem with random fixes, but it kept coming back. Nobody could reproduce it consistently — it appeared and disappeared, hit some users and not others.
La soluzione
Systematic layer-by-layer analysis, with a documented rollback plan for every change: no permanent modification before verifying its impact. Five distinct problems were found and resolved — problems that were amplifying each other: one component consumed all available memory on every page opened by a logged-in user; a disabled security plugin kept running in the background blocking other functions; the cache system was configured in a way that made it rarely work; two more problems hidden in the server configuration and database. After the fix: pages load in 0.41 seconds, zero crashes.
Risultati
Memory per page: from crash to 30MB
8,668 crashes → 0 since
Pages -24% faster as a side effect
5 distinct causes identified and resolved, not masked
Stack tecnico
- PHP-FPM 8.3 + OPcache JIT
- MariaDB (buffer pool, slow query log)
- Redis (LRU eviction policy)
- Nginx (proxy cache, gzip)
- WordPress mu-plugin custom
- Plesk + Linux