Syllabus

Course Code: *Program Elective-V MTCE-203    Course Name: Big Data Analytics

MODULE NO / UNIT COURSE SYLLABUS CONTENTS OF MODULE NOTES
1 What is big data, why big data, convergence of key trends, unstructured data, industry examples of big data, web analytics, big data and marketing, fraud and big data, risk and big data, credit risk management, big data and algorithmic trading, big data and healthcare, big data in medicine, advertising and big data, big data technologies, introduction to Hadoop, open source technologies, cloud and big data, mobile business intelligence, Crowd sourcing analytics, inter and trans firewall analytics.
2 Introduction to NoSQL, aggregate data models, aggregates, key-value and document data models, relationships, graph databases, schema less databases, materialized views, distribution models, sharding, master-slave replication, peer replication, sharding and replication, consistency, relaxing consistency, version stamps, map-reduce, partitioning and combining, composing map-reduce calculations.
3 Data format, analyzing data with Hadoop, scaling out, Hadoop streaming, Hadoop pipes, design of Hadoop distributed file system (HDFS), HDFS concepts, Java interface, data flow, Hadoop I/O, data integrity, compression, serialization, Avro, file-based data structures
MapReduce workflows, unit tests with MRUnit, test data and local tests, anatomy of MapReduce job run, classic Mapreduce, YARN, failures in classic Map-reduce and YARN, job scheduling, shuffle and sort, task execution, MapReduce types, input formats, output formats
4 Hbase, data model and implementations, Hbase clients, Hbase examples, praxis. Cassandra, Cassandra data model, Cassandra examples, Cassandra clients, Hadoop integration.
Pig, Grunt, pig data model, Pig Latin, developing and testing Pig Latin scripts.
Hive, data types and file formats, HiveQL data definition, HiveQL data manipulation, HiveQL queries.
Copyright © 2020 Kurukshetra University, Kurukshetra. All Rights Reserved.