15. Filtering RDD's, and the Minimum Temperature by Location Example
FILTERING RDD'S
And the weather data examples.
Filter() Removes Data From Your RDD
Just takes a function that returns a boolean
For example, we want to filter out entries that don't have "TMIN" in the first item of a list of data:
val minTemps = parsedLines.filter(x => x._2 == "TMIN")
Minimum Temperature In A Year
This is the Input data snippet:
ITE00100554, 18000101, TMAX, -75,,, F, ITE00100554, 18000101, TMIN, -148,,, F, GM000010962, 18000101, PRCP, 0,,, E, EZE00100082, 18000101, TMAX, -86,,, E, EZE00100082, 18000101, TMIN, -135,,, E,
Parse (Map) The Input Data
def parseLine(line: String) = {
val fields = line.split(",")
val stationID = fields(0)
val entryType = fields(2)
val temperature = fields(3).toFloat * 0.1f * (9.0f / 5.0f) + 32.0f
// This the conversion formula for temperature
(stationID, entryType, temperature)
}
val lines = sc.textFile("../1800.csv")
val parsedLines = lines.map(parseLine)The Output is (stationID, entryType, temperature)
Filter Out All But TMIN Entries
Create (stationID, Temperature) Key /Value Pairs
Find Minimum Temperature By StationID
Collect And Print The Results
Previous14. [Activity] Running the Average Friends by Age ExampleNext16. [Activity] Running the Minimum Temperature Example, and Modifying it for Maximum
Last updated