From the course: Big Data Analytics with Hadoop and Apache Spark
Unlock the full course today
Join today to access over 24,900 courses taught by industry experts.
Total score analytics
From the course: Big Data Analytics with Hadoop and Apache Spark
Total score analytics
- [Instructor] In this video, we will compute the total score for each student by subject and print the total scores for physics for all students. To compute the total score, we will use the map transform in the data frame. We can simply use the withColumn function to compute the new column from existing columns and create the TotalScore data frame. We then print the results. Let's run this code now. Next, we print the total score for physics for all students. This is a simple filter that we execute on the subject column. Let's run this code now. The execution plan shows that this filter was pushed down to HDFS, and only one partition was read for this operation. But this read the data source again and computed the total score again as there was no caching. In the next video, we will cache this data for future analytics.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.