50. GraphX, Pregel, and Breadth-First-Search with Pregel.
Analyzing and transversing graphs
Using Spark GraphX
GRAPHX
Not that kind of graph...
Graphs like oursocial network of superheroes - graphs in the computer science / network sense.
However, it's only useful for specific things.
It can measure things like "connectedness", degree distribution, average path length, triangle counts - high level measures of a graph
It can count triangles in the graph, and apply the PageRank algorithm to it.
It can also join graphs together and transform graphs quickly
For things like our "degrees of seperation" example, you won't find built-in support. But it does support the Pregel API for transversing a graph...
Introduces VertexRDD and EdgeRDD, and the Edge data type
Otherwise, GraphX code looks like any other Spark code for the most part
Creating Vertex RDD's
Creating Edge RDD's
Creating A Graph
Doing Stuff
Top 10 most-connected heroes:
Breadth-First Search With Pregel
Before, we actually implemented BFS search in Spark the hard and manual way.
Now with Pregel, that gives us an general algorithm for actually doing iterations through a graph in a general manner and we can actually implement breadth first search using the Pregel algorithm, in a little bit of a more simple way than we did it before
How Pregel Works
Vertices send messages to other vertices (their neighbours)
The graph is processed in iterations called supersteps
Each superstep does the following:
Messages from the previous iteration are received by each vertex
Each vertex runs a program to transform itself
Each vertex sends messages to other vertices
BFS: Initialization
Sending Messages
Preserving The Minimum Distance At Each Step
Pregel's vertex program will preserve the minimum distance between the one it receives and what it has:
(id, attr, msg) => math.min(attr, msg)
Its reduce operation preserves the minimum distance if multiple messages are received for the same vertex:
(a,b) => math.min(a,b)
Putting It All Together
Let's Do It
Last updated