27. Superhero Degrees of Seperation: Accumulators, and Implementing BFS in Spark
FRAMING BFS AS A SPARK PROBLEM
Implementing BFS In Spark
Represent each line as a node with connections, a color a distance.
For example:
becomes
Our initial condition is that a node is infinitely distant (9999) and white
Map Function To Convert Marvel-Graph.txt To BFS Nodes
Iteratively Process The RDD
Just like each step of our BFS example...
Go through, looking for gray nodes to expand
Color nodes we're done with black
Update the distances as we go
A BFS Iteration As A Map And Reduce Job
The mapper:
Creates new nodes for each connection of gray nodes, with a distance incremented by one, color gray, and no connections
Colors the gray node we just processed black
Copies the node itself into the results.
The reducer:
Combines together all nodes for the same hero ID
Preserves the shortest distance, and the darkest color found.
Preserves the list of connections from the original node.
How Do We Know When We'Re Done?
An accmulator allows many executors to increment a shared variable
For example:
sets up a shared accumulator named "Hit Counter" with an initial value of 0.
For each iteration, if the character we're interested in is hit, we increment the hitCounter accumulator
After each iteration, we check if hitCounter is greater than one- if so, we're done.
Off To The Code
Last updated