Award Date


Degree Type


Degree Name

Doctor of Philosophy (PhD)


Electrical and Computer Engineering

First Committee Member

Brendan Morris

Second Committee Member

Emma Regentova

Third Committee Member

Venkatesan Muthukumar

Fourth Committee Member

Mohamed Trabia

Number of Pages



In this dissertation, we tackle the task of quantifying the quality of actions, i.e., how well an

action was performed using computer vision. Existing methods used human body pose-based features to express the quality contained in an action sample. Human body pose estimation in actions such as sports actions, like diving and gymnastic vault, is particularly challenging, since the athletes undergo convoluted transformations while performing their routines. Moreover, pose-based features do not take into account visual cues such as water splash in diving. Visual cues are taken into account by human judges. In our first work, we show that using visual representation -- spatiotemporal features computed using a 3D convolutional neural network -- is more suitable as those attend to appearance and salient motion patterns of the athlete's performance. Along with developing three action quality assessment (AQA) frameworks, we also compile a diving and gymnastic vault dataset. Rather, learning an action-specific model, in our second work, we show that learning to assess the quality of multiple actions jointly is more efficient as it can exploit shared/common elements of quality among different actions. All-action modeling better uses the data, shows better generalization, and adaptation to unseen/novel action classes. Taking inspiration from the 'learning by teaching' method, we propose to take multitask learning (MTL) approach to AQA, unlike existing approaches, which follow single task learning (STL) paradigm. In our MTL approach we force the network to delineate the action sample -- recognize the action in detail, and commentate on good and bad points of the performance, in addition to the main task of AQA scoring. Through this better characterization of action sample, we are able to obtain state-of-the-art results on the task of AQA. To enable our MTL approach, we also released the largest multitask AQA dataset, MTL-AQA.


action quality assessment; action recognition; caption generation; computer vision; image processing; machine learning


Electrical and Computer Engineering

File Format


File Size

2.7 MB

Degree Grantor

University of Nevada, Las Vegas




IN COPYRIGHT. For more information about this rights statement, please visit