44. [Activity] Using MLLib to Produce Movie Recommendations

Activity

  • So the first thing, I have don is to modify the data set a little bit

  • So I have gone into the u.data file for ml-100k folder, and added three entries for a fictitious user.

    0    50    5    881250949
    0    172    5    881250949
    0    133    1    881250949
  • What these ratings mean basically is that I've created a new user id zero, which is going to represent me

  • This particular user zero user loves Star Wars which happens to be Id 50 and loves The Empire Strikes Back which is 172, five star ratings on both, but hates the movie Gone With the Wind which is rating 1

  • Import MovieRecommendationsALS.scala from sourcefolder into SparkScalaCourse in Spark-Eclipse IDE

  • Open MovieRecommendationsALS.scala and look at the code

Looking At The Code

    import org.apache.spark.mllib.recommendation._
  • The MLLib package was imported to our Scala file

    // Build the recommendation model using Alternating Least Squares
    println("\nTraining recommendation model...")

    val rank = 8
    val numIterations = 20

    val model = ALS.train(ratings, rank, numIterations)
  • The recommendation model was built by using ALS, and setting the rank and numIterations by our specified number

    val userID = args(0).toInt

    println("\nRatings for user ID " + userID + ":")

    val userRatings = ratings.filter(x => x.user == userID)

    val myRatings = userRatings.collect()

    for (rating <- myRatings) {
        println(nameDict(rating.product.toInt) + ": " + rating.rating.toString)
    }
  • We will take the userId that we want based on the command line argument

  • The ratings for all the movies that the userID have rated will be printed out

    println("\nTop 10 recommendations:")

    val recommendations = model.recommendProducts(userID, 10)
    for (recommendation <- recommendations) {
        println( nameDict(recommendation.product.toInt) + " score " + recommendation.rating )
    }
  • Next, the model will recommend the top ten movies for the userID based on ALS

  • The only difference here is put an argument of 0 when you run the scala code

  • Now, run it and see the output

  • However, there's an issue. Each time you run the model, it will display different results

But The Results Aren't Really That Great.

  • Very sensitive to the paremeters chosen. Takes more work to find optimal parameters for a data set than to run the recommendations

    • Can use "train/test" to evaluate various permutations of parameters

    • But what is a "good recommendation" anyway ?

  • I'm not convinced it's even working properly.

    • Puttin your faith in a black box is dodgy.

    • We'd get better results using our movie similarity results instead, to find similar moves to moves each user liked.

    • Complicated isn't always better.

  • Never blindly trust results when analyzing big data

    • Small problems in algorithms become big ones

    • Very often, quality of your input data is the real issue.

MLLIB Is Still Really Useful, Though.

Last updated