11. Ratings Histogram Walkthrough

UNDERSTANDING THE RATINGS COUNTER CODE

By Frank Kane

Activity

  • For this Activity, we will be using RatingsCounter.scala from SparkScalaCourse package

  • Select Run Configuration to run RatingsCounter.scala

  • Make sure RatingCounter is selected in the Name section

  • The script is actually going through 1 hundred thousand movie ratings and counting the distribution for each of the different scores

Import What We Need

    package com.sundogsoftware.spark

    import org.apache.spark._
    import org.apache.spark.SparkContext._
    import org.apache.log4j._

Set Up Our Context

    val sc = new SparkContext("local[*]", "RatingsCounter")
    // local[*], the [*] means that you are actually using all the cpu to process all the cores to do all the distributed processing

Load The Data

Extract (Map) The Data We Care About

Perform An Action: Count By Value

Sort and Display The Results

It's Just That Easy.

Last updated