13. Key /Value RDD's, and the Average Friends by Age example
KEY /VALUE RDD'S
And the "friends by age" example
RDD'S Can Hold Key /Value Pairs
For example: number of friends by age
Key is age, value is number of friends
Instead of just a list of ages or a list of # of friends, we can store (age, # friends) etc...
Creating A Key /Value RDD
Nothing special in Scala, really
Just map pairs of data into the RDD using tuples. For example.
Voila, you now have a key /value RDD.
OK to have tuples or other objects as values as well.
Spark Can Do Special Stuff With Key /Value Data
reduceByKey(): combine values with the same key using some function. rdd.reduceByKey((x, y) => x + y) adds them up.
groupByKey(): Group values with the same key
sortByKey(): Sort RDD by key values
keys(), values() - Create an RDD of just the keys, or just the values
You Can Do SQL-Style Joins On Two Key /Value-RDD's
join, rightOuterJoin, leftOuterJoin, cogroup, subtractByKey
We 'll look at an example of this later.
Mapping Just The Values Of A Key /Value RDD?
With Key /Value data, use mapValues() and flatMapValues() if your transformation doesn't affect the keys.
It's more efficient
Friends By Age Example
Input Data: ID, name, age, number of friends
0, Will, 33, 385
1, Jean-Luc, 33, 2
2, Hugh, 55, 22
3, Deanna, 40, 465
4, Quark, 68, 21
Parsing (Mapping) The Input Data
Output is key/value pairs of (age, numFriends):
33, 385
33, 2
55, 221
40, 465
...
Count Up Sum Of Friends And Number Of Entries Per Age
Compute Averages
Collect And Display The Results
Last updated