No, this is not another post about the magical five users idea that has been floating around UX circles for decades. This is about something more rigorous and practical : what statistics actually allow you to claim when your sample size is small. Two small-sample tools illustrate this well, and both are statistically rigorous and underused in UX and Human Factors.
Rule of 5, which helps you define the middle of the user experience. Imagine testing 5 users on a new checkout flow with completion times of 12, 18, 22, 45, and 50 seconds. With just these observations, there is a 93.75% probability that the true median completion time of the entire user population lies between the smallest and largest observed values. The math behind this is straightforward: for a random sample of size n, the probability that the population median falls between the minimum and maximum is 1 minus 2 raised to the power of negative n minus 1. When n equals 5, this gives 1 minus 2 to the power of negative 4, or 0.9375. In practice, this means you have bounded the typical experience. You can state with high confidence that the median user is neither extremely fast nor extremely slow, even though you only tested 5 people. The limitation is equally important to understand. This result applies to the median, not the mean, and it does not explain why the spread is wide. It tells you where the middle lives, not what causes the variability.
Rule of 3, which is especially valuable when assessing risk in safety critical systems. Suppose you test a medical device interface with 30 participants and not a single person accidentally triggers the emergency stop. Even though you observed zero errors, statistics still lets you make a meaningful claim. The Rule of Three comes from the binomial distribution and states that when zero events are observed, the upper bound of a 95% CI for the event rate is approximately 3 divided by the number of trials. With thirty trials, that gives an upper bound of about ten percent. This does not prove the error will never happen, but it does establish a ceiling on risk. If a 10% failure rate is unacceptable, your data has already told you that the design or the amount of testing needs to change. Once even a single error is observed, this shortcut no longer applies and you move into full binomial modeling.
The broader point is that small sample research is often dismissed as qualitative or anecdotal, but that is a misunderstanding of statistics rather than a limitation of the data. Rigor is not about having a huge sample size. It is about knowing exactly how much uncertainty remains in the data you have. Whether you are bounding a median with 5 users or capping a safety risk with 30, you can make mathematically defensible and decision ready claims.
Do not wait for big data to start being rigorous. Use the tools designed for the sample size you actually have!
Bahareh Jozranjbar, PhD Perceptual User Experience Lab
Wow this does make huge improvement. Excellent work scikit-learn