25. [Activity] Find the Most Popular Superhero in a Social Graph

Superhero Social Networks

  • So if a superhero appears with another superhero, these 2 superheros might be friends, so we can represent that relationship on a graph

Input Data Format

  • Marvel-graph.txt contains the superhero id and their association to different superheros on a line

  • Marvel-graph.txt:

      4397 2237 1767 472 4997 5931 6235 1478 1369 806 3994 6232
      3519 4704 2460 763 1602 5306 5358 6121 6160 2459 3173 4963 6166
      3518 5409
  • A hero may span multiple lines

  • Marvel-names.txt contains the superhero id, their superhero name and real life name delimited by a space

  • Marvel-names.txt:

      5300 "SPENCERTRACY"
      5301 "SPERZEL, ANTON"
      5302 "SPETSBURO, GEN.YURI"
      5303 "SPHINX"
      5304 "SPHINX II"
      5305 "SPHINX III"
      5306 "SPIDER-MAN/PETER PAR"
      5307 "SPIDER-MAN III/MARTH"
      5308 "SPIDER-MAN CLONE/BEN"
      5309 "SPIDER-WOMAN/JESSICA"
  • Map input data to (heroID, number of co-occurences) per line

  • Add up co-occurence by heroID using reduceByKey()

  • Flip (map) RDD to (number, heroID) so we can...

  • Use max() on the RDD to find the hero with the most co-occurences

  • Look up the name of the winner and display the result

Off To The Code...

  • Import MostPopularSuperhero.scala from resource folder into Eclipse-Scala IDE in SparkScala folder

  • Import Marvel-graph.txt and Marvel-names.txt from the resource folder into SparkScala folder to be used as the source file for MostPopularSuperhero.scala

  • The delimiter used for parseNames function is '\"'' is a quotation mark

  • countCoOccurences function is used to count the total number of other superhero occurences mapped to a key of the superhero id

  • Run the file and you should see the superhero with the most friends

And The Winner Is...

  • CAPTAIN AMERICA is the most popular superhero with 1933 co-appearances.

Exercise

  • Now you can fiddle around with this code and find out what is the top ten most popular superhero and the top ten least popular superhero

  • I have included my code topTenSuperHeroes.scala inside this folder to find out the top ten superheroes with the most and least friends

Last updated