IAM

OPENSOURCEFAN STUDYING
STUDYINGCOMPUTERSCIENCEANDMATH COMPUTERSCIENCE

Check out the latest superpixel benchmark — Superpixel Benchmark (2016) — and let me know your opinion! @david_stutz
09thOCTOBER2014

READING

N. C. Oza. Online Bagging and Boosting. International Conference on Systems, Man and Cybernetics, 2005.

The lecture "Computer Vision II" by Prof. Leibe at RWTH Aachen University was mainly concerned with tracking. In particular, tracking by online classification was discussed. For example, the work by Grabner et al. [1] uses an online variant of boosting to perform tracking. At this point, I read the paper by Oza, who proves that online bagging and online boosting converge to the offline variants with $N \rightarrow \infty$, $N$ being the size of the training set. Information on Boosting in general and AdaBoost in particular can be found in [2].

function online_bagging(
        $(x_n,t_n)$ // Sample.
    )
    for $m = 1, \ldots, M$
        $k \sim Poisson(1)$
        for $i = 1,\ldots,k$
            train $h_m$ on $x_n$
Algorithm 1: Online Bagging as proposed by Oza.

Offline bagging, given a training set $X$ of size $N$, creates $M$ different models each trained on a set created by sampling randomly with replacement from $X$. For a fixed model, Oza argues that the probability of training sample $x_n \in X$ being drawn $k$ times follows a Binomial distribution:

$P(K = k) = \begin{pmatrix}N\\k\end{pmatrix} \left(\frac{1}{N}\right)^k \left(1 - \frac{1}{N}\right)^{(N - k)}$.

It is easily verified that for $N \rightarrow \infty$, the random variable $K$ tends to be distributed according to a poisson distribution with $\lambda = 1$:

$\lim_{N \rightarrow \infty} P(K = k) = \frac{1}{e k!}$

Online bagging as proposed by Oza is given in algorithm 1.

function online_boosting(
        $(x_n,t_n)$ // Sample.
    )
    initialize $\lambda = 1$
    for $m = 1,\ldots, M$
        $k \sim Poisson(\lambda)$
        for $i = 1,\ldots,k$
            train $h_m$ on $x_n$
            if $h_m(x_n) = t_n$
                $\lambda_m^{corr} += \lambda$
                $\epsilon_m = \lambda_m^{wrong}/(\lambda_m^{corr} + \lambda_M^{wrong}$
                $\lambda = \lambda/(2(1 - \epsilon_m))$
            else
                $\lambda_m^{wrong} += \lambda$
                $\epsilon_m = \lambda_m^{wrong}/(\lambda_m^{corr} + \lambda_M^{wrong}$
                $\lambda = \lambda/(2\epsilon_m)$
Algorithm 2: Online Boosting as proposed by Oza. The running sums $\lambda_m^{corr}$ and $\lambda_m^{wrong}$ are initialized with $0$.

Boosting is based on a weighted training set where the weights are updated based on the performance of the weak classifiers. For online learning, however, each sample is available only once, therefore, the weight or importance of the sample is not known in advance - that is, we do not know whether the sample is hard to classify or not. The algorithm given by Oza uses $\lambda$ to denote the importance of a sample, initialized with $1$. Over time, this importance is adapted and used as parameter for the poisson distribution used for online bagging, see algorithm 2. The error $\epsilon_m$ of each weak classifier $h_m$, needed to define the weight of the weak classifier, can be calculated using two running sums: $\lambda_m^{corr}$ is the sum of the importance of all correctly classified samples; $\lambda_m^{wrong}$ is the sum of the importance of all wrongly classified samples.

  • [1] H. Grabner, C. Leistner, H. Bischof. Semi-supervised On-line Boosting for Robust Tracking. Proceedings of the European Conference on COmputer Vision, 2008.
  • [2] C. Bishop. Pattern Recognition and Machine Learning. Springer Verlag, New York, 2006.

What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below or get in touch with me: