Award Date

May 2023

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Committee Member

Kazem Taghva

Second Committee Member

Laxmi Gewali

Third Committee Member

Wolfgang Bein

Fourth Committee Member

Mingon Kang

Fifth Committee Member

Emma Regentova

Number of Pages

112

Abstract

Model validation is a critical step in the development, deployment, and governance of machine learning models. During the validation process, the predictive power of a model is measured on unseen datasets with a variety of metrics such as Accuracy and F1-Scores for classification tasks. Although the most used metrics are easy to implement and understand, they are aggregate measures over all the segments of heterogeneous datasets, and therefore, they do not identify the performance variation of a model among different data segments. The lack of insight into how the model performs over segments of unseen datasets has raised significant challenges in deploying machine learning models into production environments. The unstable performance is especially concerning for critical applications such as credit risk models, cancer detection, and self-driving cars, which have significant impacts on users.In this dissertation, we leverage the notion of information-theoretic explanations to measure the performance of binary classifiers over various segments of data. We provide the following contributions: 1) A distributed implementation of the explanation framework that outperforms a single-node baseline. 2) An application of the framework to summarize the model’s performance over various segments of training and testing data in terms of overall accuracy as well as providing false-positive and false-negative patterns. 3) A further application of the framework to annotate test instances with expected performance indicators at the inference time. In addition to assisting machine learning engineers with model tuning and data augmentation decisions, the proposed tools can also identify potential model bias and unfairness with respect to protected attributes and data segments.

Keywords

artificial intelligence; database; information theory; machine learning

Disciplines

Artificial Intelligence and Robotics | Computer Engineering | Computer Sciences

Degree Grantor

University of Nevada, Las Vegas

Language

English

Rights

IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/


Share

COinS