Apache 2.0 Spark with Scala

CtrlK

43. Introducing MLLib

INTRODUCING MLLIB

Movie recommendations using Spark's machine learning library

MLLIB Capabilities

Feature extraction
- Term Frequency /Inverse Document Frequency useful for search
Basic statistics
- Chi-squared test, Pearson or Spearman correlation, min, max, mean, variance
Linear regression, logistic regression
Support Vector Machines
Naives Bayes Classifier
Decision trees
K-Means clustering
Principal component analysis, singular value decomposition
Recommendations using Alternating Least Squares

Special MLLIB Data Types

Vector (dense or sparse)
LabeledPoint
Rating

For More Depth

I really like "Advanced Analytics with Spark" from O'Reilly

Let's Make Some Movie Recommendations

    val data = sc.textFile("../ml-100k/u.data")

    val ratings = data.map( x => x.split('/t').map( x => Rating(x(0).toInt, x(1).toInt, x(2).toDouble)).cache()

    val rank = 8
    val numIterations = 20
    val model = ALS.train(ratings, rank, numIterations)

    val recommendations = model.recommendProducts(userID, 10)

IT'S JUST THAT EASY ... Let's run it.

Previous42. [Activity] Using DataSets instead of RDD's Next44. [Activity] Using MLLib to Produce Movie Recommendations

Last updated 7 years ago