Ph.D. Thesis Defense: Ijaz Akhter

Tuesday 20 Nov, 2012 at 5:50 pm in Smart Room 9-105 SSE.

Abstract

A variety of dynamic objects, such as faces, bodies, and cloth, are represented in computer vision and computer graphics as a collection of moving spatial landmarks. A number of tasks are performed on this type of data such as character animation, motion editing, and nonrigid structure from motion. In theory, many of these tasks are highly under-constrained and the estimation algorithms exploit the natural regularity that exists as a cloud of points moves over time. In this thesis, we present compact and generalizable models of nonrigid objects by exploiting spatial and temporal regularities of time-varying point data. We demonstrate that several theoretically ill-posed tasks can be made well-posed with the help of these models.

Our first contribution is to propose and demonstrate the effectiveness of the linear trajectory model for representing time-varying point clouds. Traditionally, a linear shape model has been used to represent time-varying point data; the 3D shape of a nonrigid object is modeled as a linear combination of a small number of basis shapes. In contrast, we represent point trajectories as a linear combination of basis trajectories. We show that the linear trajectory and the linear shape models are dual to each other and have equal representation power. In contrast to the shape basis, however, we demonstrate that the trajectory basis can be predefined by exploiting the inherent smoothness of trajectories. In fact, we show that the Discrete Cosine Transform (DCT) is a good choice for a predefined basis and empirically demonstrate its compactness by showing that it approaches Principal Component Analysis (PCA) for natural motions.

This linear trajectory model is applied to the problem of nonrigid structure from motion. Analogous to the formulation under the shape model, the estimation of nonrigid structure from motion under the trajectory model results in an optimization problem based on orthonormality constraints. Prior work asserted that structure recovery through orthonormality constraints alone is inherently ambiguous and cannot result in a unique solution. This assertion was accepted as a conventional wisdom and was the justification of several remedial heuristics in literature. In contrast, we prove that orthonormality constraints are, in fact, sufficient to recover the 3D structure in both the linear trajectory and the shape models. Moreover, we show that the primary advantage of the trajectory model over the shape model in nonrigid structure from motion is the possibility of predefining the basis. This results in a significant reduction in unknowns and corresponding stability in estimation. We demonstrate significant improvement in reconstruction results over the state of the art.

After demonstrating the effectiveness of the linear trajectory model over linear shape model in nonrigid structure from motion, we also show how both the models can be synergistically combined. We present the bilinear spatiotemporal basis as a model to simultaneously exploit spatial and temporal regularities, while maintaining the ability to generalize well to new sequences. The model can be interpreted as representing the data as a linear combination of spatiotemporal sequences consisting of shape modes oscillating over time at key frequencies. We apply the model to natural spatiotemporal phenomena, including face, body, and cloth motion data, and demonstrate its effectiveness in terms of compaction, generalization ability, predictive precision, and efficiency against existing models. We demonstrate the application of the model in motion capture clean-up. We present an expectation maximization algorithm for motion capture labeling, gap-filling, and denoising. The solution provides drastic reduction in the clean-up time in comparison to the current industry standards.

Bio:

Ijaz Akhter completed MS Computer Science from Lahore University of Management Sciences (www.lums.edu.pk) in 2006 and M.Sc. Physics from University of the Punjab, Pakistan (www.pu.edu.pk) in 2001. His area of research is Computer Vision, particulary nonrigid structure from motion and modeling dynamic objects. He is co-supervised by Dr. Sohaib Khan (LUMS) and Dr. Yaser Sheikh (CMU). He has also recently worked for nine months at Disney Research, Pittsburgh, under the supervision of Dr. Iain Matthews. In Disney Research lab he has worked on labeling and reconstruction of Motion Capture data.

Ijaz Akhter’s research work has been published at top venues in the field, including ACM Transactions on Graphics (2012), IEEE Transactions on PAMI (2011), ICCV (2011), CVPR (2009) and NIPS (2008). He is the first graduate student based in Pakistan to publish in most of these venues. His work was invited for an oral presentation at SIGGRAPH 2012 and at CVPR 2009, and was one of the nominees for Best Student Paper Award at NIPS 2008. More than 75 citations of his papers already appear on Google Scholar, including those by very prominent researchers.

Project Pages:

Bilinear Spatiotemporal Basis Models

Nonrigid Structure from Motion in Trajectory Space

Links