Syllabus

Course Code: MCA-20-32    Course Name: Data Mining and Integration using R

MODULE NO / UNIT COURSE SYLLABUS CONTENTS OF MODULE NOTES
1 Data Warehouse: A Brief History, Characteristics, Architecture for a Data Warehouse. Data Mining: Introduction: Motivation, Importance, Knowledge Discovery Process, Data Mining Functionalities, Interesting Patterns, Classification of Data Mining Systems, Major issues, Data Preprocessing: Overview, Data Cleaning, Data Integration, Data Reduction, Data Transformation and Data Discretization, Outliers.
2 Data Mining Techniques: Clustering- Requirement for Cluster Analysis, Clustering Methods- Partitioning Methods, Hierarchical Methods, Decision Tree- Decision Tree Induction, Attribute Selection Measures, Tree Pruning. Association Rule Mining- Market Basket Analysis, Frequent Itemset Mining using Apriori Algorithm, Improving the Efficiency of Apriori. Concept of Nearest Neighborhood and Neural Networks.
3 Data Integration: Architecture of Data Integration, Describing Data Sources: Overview and Desiderate, Schema Mapping Language, Access Pattern Limitations, String Matching: Similarity Measures, Scaling Up String Matching, Schema Matching and Mapping: Problem Definition, Challenges, Matching and Mapping Systems, Data Matching: Rule- Based Matching, Learning- Based Matching, Matching by Clustering.
4 R Programming: Advantages of R over other Programming Languages, Working with Directories and Data Types in R, Control Statements, Loops, Data Manipulation and integration in R, Exploring Data in R: Data Frames, R Functions for Data in Data Frame, Loading Data Frames, Decision Tree packages in R, Issues in Decision Tree Learning, Hierarchical and K-means Clustering functions in R, Mining Algorithm interfaces in R.
Copyright © 2020 Kurukshetra University, Kurukshetra. All Rights Reserved.