All Things Data Part 1: Distributed Systems and an Introduction to Hadoop

by ACM at Sac State — on  ,  , 


Presentation by: Varun Ved

With the ever-growing amount of data present online, it has become necessary to split up the storage and processing of data among clusters of machines to increase both the availability of data, and the speed of processing it. Apache Hadoop is an open-source framework that allows for the distributed storage and processing of large data sets, and is used by companies such as Facebook, Yahoo, and over half of the companies listed in the Fortune 50.

In this presentation, Varun Ved gives an introduction to distributed systems, what they are, and their properties. An overview of the Hadoop framework and how it can be used to handle very large data sets is presented, followed by a breakdown of the Hadoop Ecosystem.

The presentation can be found at this link

A video of this presentation can be found below: