CS224W ML in graphs-study notes

Node features

Importance-based node features & structure-based features

Node degree

Node centrality

Takes the node importance in a graph into account

Eigenvector centrality

We update the following equation in a recursive manner:

$c_v = 1/\lambda \sum_{u\in N(v)} c_u$

that can be rewritten as:

$\lambda \textbf{c} = \textbf{Ac} $

Then, the centrality $c$ is the eigenvector

PageRank is for undirected graph. And the definition is:

$c_v = 1/\lambda \times \sum_{u \rightarrow v} c_u/\textbf{d_u}$

That is, pagerank vector is a stationary distribution for the random walk = principal eigenvector of storychastic adj matrix (connection equals probabilities)

Google proposes solution to solve Spider-traps(all goes to one) and Dead-end (no out)

remove dead-end by adding outs to all nodes

add a random probabiloty

Betweenness centrality

A node is important if it lies on many shortest paths between other nodes （在其他点的最短路径上面）

Closeness centrality

A node is important if it has small shortest path lengths to all other nodes. (离所有点都很近)

Clustering Coefficient

点v的邻点相连的比率, that counts # of triangles (node v, one neighbor of node v, another neighbor of v)

Graphlets

count # of graphlets (rooted and connected sub-graph) and use the # as the vector feature.

Edge Prediction

Katz index: count # of paths of all lengths (computed by powers of adj matrix) between a given pair of nodes:

$S_{v_1v_2} = \sum_{l=1}^{\inf}\beta ^l A^l_{v_1v_2} $, $\beta$ is a discount factor.

（越近，即距离越短的，关系越高）

Then:

$S = \sum \beta^i A^i = (I-\beta A)^{-1} -I $

(According to Geometric series of matrices)

Graph level features

Graph Kernel

Goal: Design graph feature vector, and the feature vector can be calculated for similarity.

Method: bag of *( simply uses * as features for graphs, and we compare the features), where * can be:

node labels
node degree
graphlets
color refinement by recursively updating (WL-test)