IAM

OPENSOURCEFAN STUDYING
STUDYINGCOMPUTERSCIENCEANDMATH COMPUTERSCIENCE

Check out the latest superpixel benchmark — Superpixel Benchmark (2016) — and let me know your opinion! @david_stutz

ARTICLE

Intrinsic Images – Introduction and Reading List

The problem of decomposing an image into reflectance and shading – known as intrinsic images – has received much attention in the last few years. This article introduces the problem of retrieving intrinsic images as well as recent literature in this area.

Introduction

The idea of intrinsic images was already picked up by Barrow and Tenenbaum [1] in 1978 and is based on the observation that humans are able to derive intrinsic characteristics from images. With intrinsic characteristics, Barrow and Tenenbaum essentially refer to a set of features such as "color, orientation, distance, size, shape, [and] illumination" [1, p.3]. The problem of retrieving these characteristics, or intrinsic images, is formulated as follows:

Problem (Intrinsic Images). Given an image (or possibly several images of the scene), retrieve a set of intrinsic images, all registered with the original image, each describing one intrinsic characteristic [1].

This problem statement is still very general, however, it captures the main goal of computing intrinsic images. It is clear, that the set of possible intrinsic characteristics may be extended by several other aspects, like motion, occlusion boundaries etc., depending on the application. In practice, when speaking of intrinsic images, most researchers are primarily interested in two characteristics: reflectance (or albedo) and shading (e.g [3,4,5,6,7]).

In this context of decomposing an image into reflectance and shading, another work of the early 70s becomes increasingly interesting: Land and McCann's Retinex Theory [2]. In their work, Land and McCann propose a simple algorithm for determining the relative difference in reflectance of two surfaces in a "Mondrian world" (i.e. a world looking like Mondrians's paintings). The basic notion on which most Retinex-based algorithms for color constancy, illumination estimation and intrinsic images are based is the following: strong gradients and edges are most likely caused by reflectance changes while small gradients are caused by shading, this means that shading is assumed to be smooth.

The above considerations can easily be formalized when assuming surfaces to be approximately Lambertian (in other words: all surfaces are matt). Then, an image can be written as product of reflectance and shading:

$I = R \cdot S$

where $I$ is the image, $R$ the reflectance and $S$ the shading image and the product is defined element-wise. Often, authors transform the above equation into log-space [3,4,5,6,7]:

$\log(I) = \log(R) + \log(S)$.

Obviously, this problem is highly underconstrained making it comparably difficult to solve. Therefore, some authors use appropriate priors on either reflectance or shading, for example shading is often assumed to be smooth while the reflectance values are often considered to be sparse (see references below). Another difficulty lies in creating appropriate datasets. To date (to the best of my knowledge), there are only three datasets for intrinsic image decompositions out there, see below.

Reading List

The following paragraphs are intended to give a brief overview over the literature on intrinsic images. Some basic references of the 70s and 80s were already introduced above:

  • [1] H. G. Barrow, J. M. Tenenbaum. Recovering Intrinsic Scene Characteristics from Images. Computer Vision Systems, pages 3-26, 1978.
  • [2] E. H. Land, J. J. McCann. Lightness and Retinex Theory. Journal of the Optical Society of America, volume 61, pages 1 - 11, 1971.

Often cited alongside Land and McCann is Horn's approach for recovering reflectance under the assumption made in the Retinex Theory:

  • [3] B. K. P. Horn. Robot Vision. MIT Press, Cambridge, MA, 1986.

Before discussing actual algorithms and approaches to decompose an image into reflectance and shading, the following papers introduce datasets and metrics used for evaluating these algorithms:

  • [4] R. Grosse, M. K. Johnson, E. H. Adelson, W. T. Freeman. Ground Truth Dataset and Baseline Evaluations for Intrinsic Image Algorithms. International Conference on Computer Vision, 2009.
  • [5] S. Bell, K. Bala, N. Snavely. Intrinsic Images in the Wild. ACM Transactions on Graphics (SIGGRAPH), 2014.
  • [6] S. Beigpour, M. Serra, J. van de Weijer, R. Benavente, M. Vanrell, O.Penacchio, D.Samaras. Intrinsic Image Evaluation on Synthetic Complex Scenes. International Conference on Image Processing, 2013.

A short (and not complete) list of possible approaches is given below. Note that Bell et al. [5] also propose a new intrinsic image algorithm.

  • [7] M. F. Tappen, W. T. Freeman, E. H. Adelson. Recovering Intrinsic Images from a Single Image. Transactions on Pattern Analysis and Machine Intelligence, 2005.
  • [8] P. V. Gehler, C. Rother, M. Kiefel, L. Zhang, B. Schoelkopf. Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance. Conference on Neural Information Processing Systems, 2011.
  • [9] E. Garces, A. Munoz, J. Lopez-Moreno, D. Gutierrez. Intrinsic Images by Clustering. Computer Graphics Forum, 2012.
  • [10] L. Shen, P Tan, S. Lin. Intrinsic Image Decomposition with Non-Local Texture Cues. Conference on Computer Vision and Pattern Recognition, 2008.
  • [11] J. T. Barron, J. Malik. Shape, Illumination, and Reflectance from Shading. Technical Report, University of California Berkeley, 2013.

Note that details on the approach by Tappen et al. can also be found in the following PhD thesis:

  • [12] M. F. Tappen. Learning Continuous Models for Estimating Intrinsic Component Images. PhD Thesis, Massachusetts Institute of Technology, 2006.

Further, the idea of intrinsic images has also been generalized to videos:

  • [13] K. J. Lee, Q. Zhao, X. Tong, M. Gong, S. Izadi, S. U. Lee, P. Tan, S. Lin. Estimation of Intrinsic Image Sequences from Image+Depth Video. European Conference on Computer Vision, 2012.
  • [14] N. Kong, P. V. Gehler, M. J. Black. Intrinsic Video. European Conference on Computer Vision, 2014.

Due to the large number of publications on this topic, discussing any of the above approaches in detail is unrealistic for this reading list, however, some pointers to the corresponding implementations, if available, are necessary.

Implementations and Datasets

Several of the authors make their source code publicly available:

Figure 1 shows examples from the datasets introduced in [4] and [6]. The images of the "Intrinsic Images in the Wild" dataset can be browsed online at opensurfaces.cs.cornell.edu/.

mit_original_panther mit_reflectance_panther mit_shading_panther
mit_original_turtle mit_reflectance_turtle mit_shading_turtle
cic_bed1_dl cic_bed1_refl cic_bed1_dl_shad
cic_room2_dl cic_room2_refl cic_room2_dl_shad

Figure 1 (click to enlarge): Images from the dataset introduced in [4], row 1 and 2, and from the dataset introduced in [6], row 3 and 4. Each row shows the original image, the reflectance image and the shading image.

What is your opinion on this article? Did you find it interesting or useful? Let me know your thoughts in the comments below or using the following platforms:

@david_stutz  

  • Hi David. Congratulations for the post, it is really interesting. I’m currently writing a bachelor thesis about this theme. I have a question for you: have you found any implementation of the method proposed by Tappen et al.? It is used in some comparisons, but I could not find any implementation of their method.

    Best Regards

  • Pingback: Retinex Theory and Algorithm - David Stutz()

  • Lynn Chuang

    Hi, David. Thanks for you information.
    I am curious about one thing. I had read “Intrinsic Images by Entropy Minimization”– Finlayson, Drew, Lu, ECCV’04, in this paper, they got an intrinsic image without shadow. But other papers, like “Intrinsic Images in the Wild” you mentioned above, they got intrinsic image with shadow(Figure 1, row 4). Why they got so different results? Do you have any idea of this? Thank you.

    • davidstutz

      I guess it depends a bit on the model/approach. Under the assumption that surfaces are approximately Lambertian (as discussed in the article), the shadow can be (approximately) retrieved from the computed albedo. This might be applicable to the paper you are referencing. However, authors may use different assumptions/models as basis for their approach. I will have a look at the paper by Finlayson et al. as soon as I have some time for it.