2013-04-12-Graphs

Table of Contents

1 Graphs & Networks    slide

1.1 Midterm Stats    notes

  • min 53.00
  • max 87.00
  • avg 75.53
  • med 77.00
  • stdev 9.28

2 Graphs    slide

  • Can model a surprising number of domains
  • Modeling with network opens up large number of algorithms
  • Linear math has many connection to graphs

2.1 Math    notes

  • Data mining theme: get your problem stated as a math problem, whole slew of solutions present themselves
  • Linear math really useful for running equations of all nodes, are simulate moving across network

3 Vertices & Edges    slide two_col

Vertex
the interconnected objects, or nodes
Edge
the lines or curves that connect vertices
Graph
Collection of vertices and edges G = (V,E)

img/GraphNodesEdges.gif

3.1 Definitions    notes

  • These are the abstract terms, how do they relate to the real world?

4 Examples    slide

Vertex
User, building, router, product
Edge
Relationship, road, network cable, purchased
Graph
Social Network, physical infrastructure, internet, purchasing history

4.1 Examples    notes

  • Many graphs have assumed edge labels: they edges represent something consistent
  • Some graphs have multiple times of edges: relationship is one of family, friend, co-worker, etc.
  • Edge can be anything that ties two things together: purchase history, eg. is not a physical thing connecting, but an idea

5 Social Networks    slide two_col

  • Edge connecting two people
  • If this is just a line, what information are we missing about how the link was formed?

img/jblomo-linkedin.gif

5.1 Symmetric    notes

  • "Just a line" is symmetric, ie "undirected"
  • We're missing information about who invited whom. Asymmetric and directed

6 Definitions    slide two_col

Directed
Connections have a direction. Invitations, water pipes, email
Undirected
Connections have no direction. "Friends," walkways on campus, physical wires
Cycle
Set of nodes and edges in which you can travel back to a vertex
Acyclic
A graph without any cycles

img/Directed_acyclic_graph.png

6.1 Modeling    notes

  • Can always model undirected graph as a directed one by having two connections between nodes always

7 Acyclic?    slide

  • Social network (undirected)
  • Product purchases (directed)
  • Internet links (directed)
  • Class prerequisites (directed)

7.1 Answers    notes

  • Social network: cyclic
  • Product purchases: acyclic
  • Internet links: cyclic
  • Class prerequisites: acyclic

8 Bipartite    slide two_col

  • Graph whose vertices can be divided into two distinct sets
  • Vertices in U are only connected to those in V, vice versa
  • Product purchases: users U, products V

img/Simple-bipartite-graph.svg.png

8.1 Recommendations    notes

  • Can model recommendations as link following:
  • From a user, follow to products
  • From products, follow back to other users
  • From other users, follow back to products

9 Measurements    slide

Geodesic distance
Number of edges to connect to vertices
Eccentricity
Largest geodesic distance from v to another
Radius
Minimum eccentricity
Diameter
Maximum eccentricity
Peripheral vertex
Vertex with eccentricity == diameter
Incoming/Outgoing edge count
Number of edges point to or from an edge

9.1 Data Stats    notes

  • Similar to getting distribution stats from initial datasets, these measurements can help you understand graphs as a summary
  • Once you have the incoming/outgoing edge counts, can use regular stats: what is the distribution of counts?

9.2 Examples    slide center

img/6n-graf.svg.png

9.2.1 Answers    notes

  • Distance 6, 5: 2
  • Eccentricity 2: 3 (disconnected graph is infinity)
  • Radius: 2
  • Diameter: 3
  • Pericheral Verticies: 1, 2, 6

10 SimRank    slide

  • Vertices are similar if they share similar neighbors
  • SimRank between two vertices is the average of the SimRank of its neighbors

img/simrank.png img/simrank-iterative.png

10.1 Recursive    notes

  • This is an iterative and recursive definition
  • Iterative because neighbors are influenced by each other
    • What is your simrank? Well, what is your simrank?
    • Converges
  • Recursive because you're figuring out simrank for all neighbors

10.2 Example    slide

img/6n-graf.svg.png

2 SimRank 4

10.2.1 Calculations    notes

  • I(u) = 5,3,1
  • I(v) = 5,3,6
  • C = 0.6 daming factor.. similarity fades over time
  • s0(5,5) = 1
  • s0(3,3) = 1
  • s0(5,3) = 0
  • s0(3,5) = 0
  • s0(1,5) = s0(1,3) = 0, s0(6,1)… = 0
  • s1= 0.6/(2*2) * sum(1,1,0,0,0,0)
  • 0.3
  • Next round, we'll need to figure out s1 of 5,3 to calculate update

11 Random Walk    slide

  • Many algorithms based on concept of randomly deciding:
    • Follow link or not
    • Which link to follow
  • Simulate the decision many times
  • What is the probability you will wind up on u from v?

12 Break    slide

Date: 2013-04-12 13:44:58 PDT

Author: Jim Blomo

Org version 7.8.02 with Emacs version 23

Validate XHTML 1.0