IAM

APRIL2015

READING

A. Saffari, C. Leistner, J. Santner, M. Godec, H. Bischof. On-line Random Forests. International Conference on Computer Vision Workshops, 2009.

Saffari et al. propose an online random forest algorithm (see [1] for the original paper introducing random forests) based on online bagging [2]. Note the difference between on-line and incremental algorithms: While incremental algorithms have memory, that is they may store each incoming sample, an online algorithm sees each sample exactly once. On common datasets, as for example the USPS Dataset, Saffari et al. show that online random forests converge to the offline algorithm.

The original C++ implementation of online random forests can be found on Saffari's webpage. Figure 1, in contrast, shows the performance of a custom implementation on the MNIST dataset [3].

online_rf_10 online_rf_100

Figure 1 (click to enlarge): Performance of online random forests as described by Saffari et al. compared against offline random forests [1] on the MNIST dataset [3]. Denoting the forest size by $T$, online random forests converge to the offline variant for both $T = 10$ (left) and $T = 100$ (right).

The implementation is currently under development, however, may be made publicly available within the next few weeks.

  • [1] L. Breiman. Random Forests. Machine Learning, 2001.
  • [2] N. C. Oza. Online Bagging and Boosting. International Conference on Systems, Man and Cybernetics, 2005.
  • [3] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner. Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, pages 2278-2324, 1998.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.