42. [Activity] Using DataSets instead of RDD's
Activity
Looking At The Code
// Read in each rating line and extract the movie ID; construct an RDD of Movie objects.
val lines = spark.sparkContext.textFile("../ml-100k/u.data").map(x => Movie(x.split("\t")(1).toInt)) // Some SQL-style magic to sort all movies by popularity in one line!
val topMovieIDs = moviesDS.groupBy("movieID").count().orderBy(desc("count")).cache() // Grab the top 10
val top10 = topMovieIDs.take(10)
// Load up the movie ID -> name map
val names = loadMovieNames()
// Print the results
println
for (result <- top10) {
// result is just a Row at this point; we need to cast it back.
// Each row has movieID, count as above.
println (names(result(0).asInstanceOf[Int]) + ": " + result(1))
}Last updated