Node features
Importance-based node features & structure-based features
Node degree
Node centrality
Takes the node importance in a graph into account
Eigenvector centrality
We update the following equation in a recursive manner:
$c_v = 1/\lambda \sum_{u\in N(v)} c_u$
that can be rewritten as:
$\lambda \textbf{c} = \textbf{Ac} $
Then, the centrality $c$ is the eigenvector
PageRank is for undirected graph. And the definition is:
$c_v = 1/\lambda \times \sum_{u \rightarrow v} c_u/\textbf{d_u}$
That is, pagerank vector is a stationary distribution for the random walk = principal eigenvector of storychastic adj matrix (connection equals probabilities)
Google proposes solution to solve Spider-traps(all goes to one) and Dead-end (no out)
- remove dead-end by adding outs to all nodes
- add a random probabiloty
Betweenness centrality
A node is important if it lies on many shortest paths between other nodes (在其他点的最短路径上面)
Closeness centrality
A node is important if it has small shortest path lengths to all other nodes. (离所有点都很近)
Clustering Coefficient
点v的邻点相连的比率, that counts # of triangles (node v, one neighbor of node v, another neighbor of v)
Graphlets
count # of graphlets (rooted and connected sub-graph) and use the # as the vector feature.
Edge Prediction
Katz index: count # of paths of all lengths (computed by powers of adj matrix) between a given pair of nodes:
$S_{v_1v_2} = \sum_{l=1}^{\inf}\beta ^l A^l_{v_1v_2} $, $\beta$ is a discount factor.
(越近,即距离越短的,关系越高)
Then:
$S = \sum \beta^i A^i = (I-\beta A)^{-1} -I $
(According to Geometric series of matrices)
Graph level features
Graph Kernel
Goal: Design graph feature vector, and the feature vector can be calculated for similarity.
Method: bag of *( simply uses * as features for graphs, and we compare the features), where * can be:
- node labels
- node degree
- graphlets
- color refinement by recursively updating (WL-test)