A. Criminisi, J. Shotton. Density Forests. In Decision Forests for Computer Vision and Medical Image Analysis, Springer London, 2013.

As part of "Decision Forests for Computer Vision and Medical Image Analysis" by Criminisi and Shotton, this chapter discusses density forests: Decision forests used for density estimation (that is, decision forests for unsupervised learning or clustering forests). At the leafs, each tree contains a simple predcition model: a Gaussian distribution. As result, a single density tree can be viewed as special case of a Gaussian Mixture Model (e.g. see [1, ch. 9]) using hard assignments instead of soft assigments.

Criminisi and Shotton detail the effects of the number of trees as well as the maximum depth of each individual tree on the estimated density. Figure 1 shows results taken from their chapter. Source code for density trees and forests is available as part of the Sherwood C++ and C# library for decision forests.

Figure 1 (click to enlarge): The influence of $T$, the number of trees, and $D$, the maximum depth of each individual tree on a toy dataset. Top row: $T=1$ with $D = 2,4$ and $6$. Bottom row: $T=400$ with $T=2,4$ and $6$.

In contrast, figure 2 shows results using a custom implementation. Obviously, density trees are capable of representing Gaussian mixtures nearly perfectly. However, density trees tend to overfit the data and, therefore, parameters such as maximum depth or the number of samples required for splits is critical. The implementation is still under development, however, will be made publicly available in the next few months.

density_tree_1000_2_samples_colored density_tree_1000_2_gaussians density_tree_1000_2_tree
density_tree_10000_5_samples density_tree_10000_5_gaussians density_tree_10000_5_tree
density_tree_10000_5_1000_samples density_tree_10000_5_1000_gaussians density_tree_10000_5_1000_tree

Figure 2 (click to enlarge): Single density tree applied on the samples shown in the left most column. The samples are colored according to the corresponding leaf and were generated using a mixture of Gaussians, shown in column two. The estimated density is shown in column three. The result is highly dependent on parameter settings such as minimum number of samples required for splits. From top to bottom: $1000$ samples, $2$ components and $250$ samples required for a split; $10000$ samples, $7$ components and $250$ samples required for a split; $10000$ samples, $7$ components and $1000$ samples required for a split.
  • [1] C. Bishop. Pattern Recognition and Machine Learning. Springer Verlag, New York, 2006.

What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: