42. [Activity] Using DataSets instead of RDD's
Activity
So Datasets are kind of taking over Spark and you should be using Datasets instead of RDD it make sense
This is because it is a faster implementation over RDD
Import PopularMoviesDatasets.scala from sourcefolder into SparkScalaCourse in Spark-Eclipse IDE
Open PopularMoviesDatasets.scala and look at the code
Looking At The Code
The first part of the script is actually unchanged from the PopularMovies.scala example
We define the structure of the data as Movie class which contains the movieID: Int
We are parsing into a RDD of Movie object before converted it to a Dataset of Movie objects
Now we do not have to pass it through hoops of mapping key and values to get the result that we want
We are sorting the movies with the most ratings by using this one line in descending order
We are printing each result by casting as an Instance of Int for movieIDs into names to get the movie name
The movie name and the number of ratings will be printed next
Now run this code and see the output
Last updated