Award Date
May 2017
Degree Type
Thesis
Degree Name
Master of Science in Computer Science
Department
Computer Science
First Committee Member
Justin Zhan
Second Committee Member
Laxmi Gewali
Third Committee Member
Fatma Nasoz
Fourth Committee Member
Ge Lin Kan
Number of Pages
55
Abstract
High utility itemset mining is an important data mining problem which considers profit factors besides quantity from the transactional database. It helps find the most valuable products/items that are difficult to track using only the frequent data mining set. An item that has a high-profit value might be rare in the transactional database despite its tremendous importance. While there are many existing algorithms which generate comparatively large candidate sets while finding high utility itemsets, the major focus is to reduce the computational time significantly with the introduction of pruning strategies. Another aspect of high utility itemset mining is to compute the large dataset. There are very few algorithms that can handle a large dataset to find high utility itemset mining in a parallel (distributed) system.
In this thesis, there are two proposed methods: 1) High utility itemset mining using pruning strategies approach (HUI-PR) and 2) Parallel EFIM (EFIM-Par). In the method I, the proposed algorithm constructs the candidate sets in the form of a tree structure, which traverses the itemsets with High Transaction-Weighted Utility (HTWUIs). It uses a pruning strategies to reduce the computational time by refraining the visit to unnecessary nodes of an itemset to reduce the search space. It significantly minimizes the transaction database generated on each node. In the method II, the distributed approach is proposed dividing the search space among different worker nodes to compute high utility itemsets which are aggregated to find the result. The experimental results for both methods show that they significantly improve the execution time for computing the high utility itemsets.
Keywords
Data mining; itemset mining; parallel computing; spark
Disciplines
Computer Sciences
File Format
Degree Grantor
University of Nevada, Las Vegas
Language
English
Repository Citation
Tamrakar, Ashish, "High Utility Itemsets Identification in Big Data" (2017). UNLV Theses, Dissertations, Professional Papers, and Capstones. 3044.
http://dx.doi.org/10.34917/10986189
Rights
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/