GNN/GIN/GMN study

GNN

$$
H^{(k+1)} = activation(D^{-1/2}(A+I)D^{1/2}H^kW) \
A = adjacent \ matrix \
D = degree \ matrix \
H^0 = features \ of \ graph \ nodes \
W = trainable \ variables
$$

Reason:

$(A+I)H$ represents that the feature of one node is the sum of features of node itself (plus $I$) and its adjacent nodes( $A$)

$D^{-1/2}(A+I)D^{1/2}$ = normalized ($D^-1$)

GIN

This is one of my favourite papers.

ref: How powerful are graph neural networks (ICLR2019)

Lemma 2. Let G1 and G2 be any two non-isomorphic graphs. If a graph neural network A : G → $R^d$ maps G1 and G2 to different embeddings, the Weisfeiler-Lehman graph isomorphism test also decides
G1 and G2 are not isomorphic.

lemma 2 means that upper bound of GNN equals to 1-dim WL test

Theorem 3. Let A : G → $R^d$ be a GNN. With a sufficient number of GNN layers, A maps any
graphs G1 and G2 that the Weisfeiler-Lehman test of isomorphism decides as non-isomorphic, to
different embeddings if the following conditions hold: all of operations in GNN (aggregate, combine and readout are injective(单射))

based on theorem 3, we can propose a simplest GNN model : GIN, where aggeregate = sum, combine = (1+ $\alpha$) h + aggeragate , ($alpha$ is simply a pre-defined scalar)

GMN

Graph match

(input G1 G2, output similarity score)

Only difference with GNN: when we combine information, we will consider one more term:

$\mu_{j\rightarrow i} = a_{j\rightarrow i}(h_{i}^{(t)}- h_{j}^{(t)})$

i,j different graph, $a_{j\rightarrow i}$ is the softmax value of vector difference between hi and hj.