Apache 2.0 Spark with Scala

CtrlK

12. Spark Internals

SPARK INTERNALS

What actually happenend?

An Execution Plan Is Created From Your RDD'S

This start from textFile() -> map() -> countByValue()

The Job Is Broken Into Stages Based On When Data Needs To Be Organized

Stage 1 -> textFile(), map()
Stage 2 -> countByValue()

Each Stage Is Broken Into Tasks (Which May Be Distributed Across A Cluster)

Finally The Tasks Are Scheduled Across Your Cluster And Executed

Previous11. Ratings Histogram Walkthrough Next13. Key /Value RDD's, and the Average Friends by Age example

Last updated 7 years ago