Sample variance of a data stream $\left \{ X_1, X_2, \cdots, X_N, X_{N+1}, \cdots \right \}$ can be computed without saving the data points individually, but only sample mean and sample variance values for the current sample size $N$:

$$ \begin{align} s2 N &= \frac{1}{N} \sum {i=1}^N \left( X i - \bar{X} N \right)2 \\ &= \frac{1}{N} \sum {i=1}^N \left( X i2 - \bar{X} _N2 \right)~, \end{align} $$

where

$$ \begin{equation} {\bar{X}} N = \frac{1}{N} \sum {i=1}^N X _i~. \end{equation} $$

Store $N$, $\bar{X} N$, and $s N2$. When datum $X _{N+1}$ is obtained, the values can be updated as the following:

$$ \begin{equation} \bar{X} {N+1} = \frac{1}{N+1} \left( N \bar{X} N + X _{N+1} \right)~, \end{equation} $$

$$ \begin{align} s2 {N+1} &= \frac{1}{N+1} \sum {i=1}^{N+1} \left( X i2 - \bar{X} {N+1}^2\right) \\ &= \frac{1}{N+1} \sum {i=1}^{N+1}X i2 - \bar{X} {N+1}^2 \\ &= \frac{1}{N+1} \left(N s N2 + \bar{X} N2 + X {N+1}^2 \right)- \bar{X} _{N+1}^2~. \end{align} $$

Posted Tue Oct 6 19:39:59 2015 Tags:
blog comments powered by Disqus