12. Spark Internals
SPARK INTERNALS
What actually happenend?
An Execution Plan Is Created From Your RDD'S
This start from textFile() -> map() -> countByValue()
The Job Is Broken Into Stages Based On When Data Needs To Be Organized
Stage 1 -> textFile(), map()
Stage 2 -> countByValue()
Each Stage Is Broken Into Tasks (Which May Be Distributed Across A Cluster)
Finally The Tasks Are Scheduled Across Your Cluster And Executed
Previous11. Ratings Histogram WalkthroughNext13. Key /Value RDD's, and the Average Friends by Age example
Last updated