From the course: Big Data Analytics with Hadoop and Apache Spark

Unlock the full course today

Join today to access over 24,900 courses taught by industry experts.

Total score analytics

Total score analytics

- [Instructor] In this video, we will compute the total score for each student by subject and print the total scores for physics for all students. To compute the total score, we will use the map transform in the data frame. We can simply use the withColumn function to compute the new column from existing columns and create the TotalScore data frame. We then print the results. Let's run this code now. Next, we print the total score for physics for all students. This is a simple filter that we execute on the subject column. Let's run this code now. The execution plan shows that this filter was pushed down to HDFS, and only one partition was read for this operation. But this read the data source again and computed the total score again as there was no caching. In the next video, we will cache this data for future analytics.

Contents