Award Date
12-1-2021
Degree Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science
First Committee Member
Fatma Nasoz
Second Committee Member
Mingon Kang
Third Committee Member
Beiyu Lin
Fourth Committee Member
Kazem Taghva
Fifth Committee Member
Brendan Morris
Number of Pages
97
Abstract
State-of-the-art image captioning models can successfully produce a diverse set of accurate captions. Previous research has focused on improving caption diversity while maintaining a high level of fidelity. We shift the focus from accuracy and diversity to controllability. We use a modified version of the traditional encoder-decoder network that allows the model to produce a meaningful and structured latent space. We then explore the latent space using several latent cartographic methods: lerp, slerp, analogy completion, attribute vector rotation, and interpolation graphs. Additionally, we discuss different categories of latent space and provide modifications for each of the cartographic methods. Finally, we show that it is possible to generate a set of diverse and accurate captions with desired real space semantics by sampling from different areas of the latent space.
Keywords
Computer Vision; Generative Networks; Image Captioning; Latent Space; Machine Learning; Natural Language Processing
Disciplines
Computer Sciences
File Format
File Size
16200 KB
Degree Grantor
University of Nevada, Las Vegas
Language
English
Repository Citation
Musser, Mikian J., "Exploring the Latent Space of Image Captioning Networks" (2021). UNLV Theses, Dissertations, Professional Papers, and Capstones. 4306.
http://dx.doi.org/10.34917/28340356
Rights
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/