I am looking for full-time (applied) research opportunities in industry, involving (trustworthy and robust) machine learning or (3D) computer vision, starting early 2022. Check out my CV and get in touch on LinkedIn!


G. Louppe, L. Wehenkel, A. Sutera, P. Geurts. Understanding Variable Importances in Forests of Randomized Trees. Advances in Neural Information Processing Systems, 2013.

Louppe et al. discuss the Mean Decrease Impurity (MDI) variable importance measure and derive several theoretical properties. Given a forest of $T$ randomized trees, each expecting $D$-dimensional input $x = (x_1, \ldots, x_D)$, the Mean Decrease Impurity variable importance for dimension $d$ is computed as:

$MDI(x_d) = \frac{1}{T} \sum_{t = 1}^T \sum_{s(v) = x_n} \frac{N_v}{N} \Delta i(v)$(1)

where $N$ is the total number of examples, $N_v$ is the number of examples reaching inner node $v$ and $\Delta i(v)$ is the impurity decrease achieved at node $v$. Here, the second sum in equation (1) runs over all inner nodes $v$ in tree $t$ where feature dimension $d$ is selected as split feature.

What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: