From the course: Learning Hadoop

Unlock the full course today

Join today to access over 24,900 courses taught by industry experts.

Explore Hadoop file systems: HDFS

Explore Hadoop file systems: HDFS - Hadoop Tutorial

From the course: Learning Hadoop

Explore Hadoop file systems: HDFS

- [Instructor] When you're setting up your Hadoop cluster, the choice of file system will impact performance, cost, and manageability. By default, Hadoop expects HDFS, the Hadoop Distributed File System. However, in the many years that Hadoop has evolved, there are different versions, DFS. There are versions that are specific to vendors so Cloudera or Databricks. And these can be distributed or pseudo-distributed. Distributed means, by default, replicated three times. Pseudo-distributed is for testing and it is a single node that looks like a distributable node, but it's much cheaper and quicker for testing. You can use a regular file system so not use HDFS. That's the simplest possible thing if you are just learning. I don't find I use that very often. So no HDFS basically. More generally, I will use cloud-based file systems if they are available. Now, as we'll see in a minute, there are some optimizations for…

Contents