A face image contains geometric cues in the form of congurational information (semantically
meaningful landmark points and contours). In this thesis, we explore to what degree
such 2D geometric information allows us to estimate 3D face shape.
First, we focus on the problem of tting a 3D morphable model to single face images
using only sparse geometric features. We propose a novel approach that explicitly computes
hard correspondences which allow us to treat the model edge vertices as known 2D positions,
for which optimal pose or shape estimates can be linearly computed. Moreover, we show
how to formulate this shape-from-landmarks problem as a separable nonlinear least squares
optimisation.
Second, we show how a statistical model can be used to spatially transform input data as
a module within a convolutional neural network. This is an extension of the original spatial
transformer network in that we are able to interpret and normalise 3D pose changes and
self-occlusions. We show that the localiser can be trained using only simple geometric loss
functions on a relatively small dataset yet is able to perform robust normalisation on highly
uncontrolled images. We consider another extension in which the model itself is also learnt.
The nal contribution of this thesis lies in exploring the limits of 2D geometric features
and characterising the resulting ambiguities. 2D geometric information only provides a
partial constraint on 3D face shape. In other words, face landmarks or occluding contours
are an ambiguous shape cue. Two faces with dierent 3D shape can give rise to the same 2D
geometry, particularly as a result of perspective transformation when camera distance varies.
We derive methods to compute these ambiguity subspaces, demonstrate that they contain
signicant shape variability and show that these ambiguities occur in real-world datasets. |