GNN
$$
H^{(k+1)} = activation(D^{-1/2}(A+I)D^{1/2}H^kW) \
A = adjacent \ matrix \
D = degree \ matrix \
H^0 = features \ of \ graph \ nodes \
W = trainable \ variables
$$
Reason:
$(A+I)H$ represents that the feature of one node is the sum of features of node itself (plus $I$) and its adjacent nodes( $A$)
$D^{-1/2}(A+I)D^{1/2}$ = normalized ($D^-1$)
GIN
This is one of my favourite papers.
ref: How powerful are graph neural networks (ICLR2019)
Lemma 2. Let G1 and G2 be any two non-isomorphic graphs. If a graph neural network A : G → $R^d$ maps G1 and G2 to different embeddings, the Weisfeiler-Lehman graph isomorphism test also decides
G1 and G2 are not isomorphic.
lemma 2 means that upper bound of GNN equals to 1-dim WL test
Theorem 3. Let A : G → $R^d$ be a GNN. With a sufficient number of GNN layers, A maps any
graphs G1 and G2 that the Weisfeiler-Lehman test of isomorphism decides as non-isomorphic, to
different embeddings if the following conditions hold: all of operations in GNN (aggregate, combine and readout are injective(单射))
based on theorem 3, we can propose a simplest GNN model : GIN, where aggeregate = sum, combine = (1+ $\alpha$) h + aggeragate , ($alpha$ is simply a pre-defined scalar)
GMN
Graph match
(input G1 G2, output similarity score)
Only difference with GNN: when we combine information, we will consider one more term:
$\mu_{j\rightarrow i} = a_{j\rightarrow i}(h_{i}^{(t)}- h_{j}^{(t)})$
i,j different graph, $a_{j\rightarrow i}$ is the softmax value of vector difference between hi and hj.