From the course: Learning Hadoop

Understand Hadoop components - Hadoop Tutorial

From the course: Learning Hadoop

Understand Hadoop components

- [Instructor] Let's drill into Hadoop processing and storage. You'll be reminded that the Hadoop ecosystem consists of compute and storage. Compute is MapReduce and storage is HDFS. Notice on the basic implementation, you have one master and one worker plane. And these, in our situation, are separate virtual machines. They don't have to be, they could be containers, they could be other types of compute, but that's what we've got in our cloud implementation. And you'll notice that running on each of the machines are a number of trackers or nodes. The task tracker, the job tracker, and the name node and the data node. Now if we get a little bit more detail and we start thinking about the physical cluster architecture that we've implemented using GCP Dataproc, we have our master node, which has the resource manager, and the name node, and then we have our workers. Do you remember how many we set up? It's actually two. It's configurable to have a standard size and/or add auto-scaling. So on our worker nodes, you'll see that we have data nodes, node managers, and mappers and reducers when we implement jobs. We haven't run any jobs yet, but the idea here is to bring the compute to the data, to partition the data, and partition the compute.

Contents