# 27. Superhero Degrees of Seperation: Accumulators, and Implementing BFS in Spark

## FRAMING BFS AS A SPARK PROBLEM

### Implementing BFS In Spark

* Represent each line as a node with connections, a color a distance.
* For example:

  ```
    5983 1165 3836 4361 1282
  ```
* becomes

  ```
    (5983, (1165, 3836, 4361, 1282), 9999, WHITE)
  ```
* Our initial condition is that a node is infinitely distant (9999) and white

### Map Function To Convert Marvel-Graph.txt To BFS Nodes

```
    def convertToBFS(line: String): BFSNode = {
        val fields = line.split("\\s+")
        val heroID = fields(0).toInt

        var connections: ArrayBuffer(Int) = ArrayBuffer()
        for (connections <- 1 to (fields.length - 1)){
            connections += fields(connection).toInt
        }

        var color: String = "WHITE"
        var distance: Int = 9999

        if (heroID == startCharacterID){
            color = "GRAY"
            distance = 0
        }

    }
```

### Iteratively Process The RDD

* Just like each step of our BFS example...
* Go through, looking for gray nodes to expand
* Color nodes we're done with black
* Update the distances as we go

### A BFS Iteration As A Map And Reduce Job

* The mapper:
  * Creates new nodes for each connection of gray nodes, with a distance incremented by one, color gray, and no connections
  * Colors the gray node we just processed black
  * Copies the node itself into the results.
* The reducer:
  * Combines together all nodes for the same hero ID
  * Preserves the shortest distance, and the darkest color found.
  * Preserves the list of connections from the original node.

### How Do We Know When We'Re Done?

* An accmulator allows many executors to increment a shared variable
* For example:

  ```
    var hitCOunter: LongAccumulator("Hit Counter")
  ```
* sets up a shared accumulator named "Hit Counter" with an initial value of 0.
* For each iteration, if the character we're interested in is hit, we increment the hitCounter accumulator
* After each iteration, we check if hitCounter is greater than one- if so, we're done.

### Off To The Code


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://alvintoh.gitbook.io/apache-2-0-spark-with-scala/section-4-advanced-examples-of-spark-programs/27.-superhero-degrees-of-seperation-accumulators-and-implementing-bfs-in-spark.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
