Award Date

May 2023

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Committee Member

Kazem Taghva

Second Committee Member

Laxmi Gewali

Third Committee Member

Wolfgang Bein

Fourth Committee Member

Mingon Kang

Fifth Committee Member

Emma Regentova

Number of Pages

112

Abstract

Model validation is a critical step in the development, deployment, and governance of machine learning models. During the validation process, the predictive power of a model is measured on unseen datasets with a variety of metrics such as Accuracy and F1-Scores for classification tasks. Although the most used metrics are easy to implement and understand, they are aggregate measures over all the segments of heterogeneous datasets, and therefore, they do not identify the performance variation of a model among different data segments. The lack of insight into how the model performs over segments of unseen datasets has raised significant challenges in deploying machine learning models into production environments. The unstable performance is especially concerning for critical applications such as credit risk models, cancer detection, and self-driving cars, which have significant impacts on users.In this dissertation, we leverage the notion of information-theoretic explanations to measure the performance of binary classifiers over various segments of data. We provide the following contributions: 1) A distributed implementation of the explanation framework that outperforms a single-node baseline. 2) An application of the framework to summarize the model’s performance over various segments of training and testing data in terms of overall accuracy as well as providing false-positive and false-negative patterns. 3) A further application of the framework to annotate test instances with expected performance indicators at the inference time. In addition to assisting machine learning engineers with model tuning and data augmentation decisions, the proposed tools can also identify potential model bias and unfairness with respect to protected attributes and data segments.

Keywords

artificial intelligence; database; information theory; machine learning

Disciplines

Artificial Intelligence and Robotics | Computer Engineering | Computer Sciences

File Format

pdf

Degree Grantor

University of Nevada, Las Vegas

Language

English

Repository Citation

Esmaeilzadeh, Armin, "Information-Theoretic Model Diagnostics (InfoMoD)" (2023). UNLV Theses, Dissertations, Professional Papers, and Capstones. 4676.
http://dx.doi.org/10.34917/36114701

Rights

IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/

Download

Included in

Artificial Intelligence and Robotics Commons, Computer Engineering Commons

COinS

Digital Scholarship@UNLV

UNLV Theses, Dissertations, Professional Papers, and Capstones

Information-Theoretic Model Diagnostics (InfoMoD)

Award Date

Degree Type

Degree Name

Department

First Committee Member

Second Committee Member

Third Committee Member

Fourth Committee Member

Fifth Committee Member

Number of Pages

Abstract

Keywords

Disciplines

File Format

Degree Grantor

Language

Repository Citation

Rights

Included in

Browse

Digital Scholarship@UNLV

UNLV Theses, Dissertations, Professional Papers, and Capstones

Information-Theoretic Model Diagnostics (InfoMoD)

Author

Award Date

Degree Type

Degree Name

Department

First Committee Member

Second Committee Member

Third Committee Member

Fourth Committee Member

Fifth Committee Member

Number of Pages

Abstract

Keywords

Disciplines

File Format

Degree Grantor

Language

Repository Citation

Rights

Included in

Share

Browse