System Reliability
Series system: All components must work. R_system = R₁ × R₂ × ... × Rₙ. Adding more components in series always reduces reliability.
Parallel system (redundancy): System fails only if all components fail. R_system = 1 − (1−R₁)(1−R₂)...(1−Rₙ). Adding redundancy improves reliability.
k-of-n redundancy: System works if at least k of n components work. Used in aircraft hydraulic systems, voting systems.
FMEA and Fault Trees
FMEA (Failure Mode and Effects Analysis): Bottom-up analysis — for each component, ask "what modes can it fail in?" and "what effect does each failure mode have on the system?" Produces a risk priority number (RPN = Severity × Occurrence × Detectability).
Fault Tree Analysis (FTA): Top-down analysis — start with an undesired event (system failure) and work backward to find all combinations of component failures that could cause it. Uses AND/OR gates.
🚀
Five Nines Availability
"Five nines" (99.999%) availability means a system is down no more than ~5.26 minutes per year. This is the target for telephone networks, emergency services, and critical infrastructure. Achieving it requires redundancy at every level: power, networking, hardware, software, and even the physical building. A single point of failure anywhere can break the five-nines target.