From the course: Big Data Analytics with Hadoop and Apache Spark

Unlock the full course today

Join today to access over 24,900 courses taught by industry experts.

Compression

Compression

- [Instructor] When storing big data, compressing data is important as it saves significant disc space and hence reduces operational costs. In this video, I will review the various file compression options available. The most popular compression codecs available are Snappy, LZO, GZIP, and bzip2. You can also develop your own codec if required. Snappy is a compression codec developed by Google. It provides moderate compression, but excellent read-write performance. Snappy compresses the entire file as opposed to compressing it element by element. It is not splittable and hence not suitable for parallel operations. LZO is similar to Snappy in that it provides moderate compression and excellent processing performance. It can also be used to split files and hence as an advantage, with parallel processing. But it requires a separate license that needs to be carefully evaluated for possible costs. GZIP is a popular codec that provides very good compression. It has moderate read-write…

Contents