PyTorch implement Color-GCN中的挫折

Dropout & Long

用Pytorch Implement GCN的时候，有一个error卡了我一天，google也没相关的答案：

1	fused_dropout not implemented for 'long'

自己找了会才发现不是dropout这个value的问题，而是input feature是 long。。。原来torch.Tensor([integer])会默认给int64(Long)而不是int32.。。真是奇怪的feature呢。

Cuda device

1	RuntimeError: Expected object of device type cuda but got device type cpu for argument #2 'mat2' in call to _th_mm

data,features,model全部都cuda()了，结果还是有问题。一个小时后的思索后。本侦探再次破案：

model.cuda() 这个func应该是把所有model中的Tensor 调用一遍 cuda()，大概伪代码就是：

1 2	for Var in model: Var.cuda()

但是问题在于！

如果Var不是Tensor 而是一个list（别问为啥会是list，菜鸡最喜欢的就是list）

那么Var.cuda()就失效了，这样就需要在initilize的时候就遍历list里面所有的tensor，并调用tensor.to(‘device’)

Floating point exception (core dumped)

1	Floating point exception (core dumped)

网上查了下这个就是某一步计算分母为0的error。

但是蛋疼的是没有指示哪一行有错。而且python（或者说dgl还是pytorch）有个蛋疼的feature是这个exception在还没来得及print的时候就会直接跳出程序。（也就是说还不能用简单的print来定位）

最后用IPython embed()很蠢的一步一步。。发现问题是如果dgl.Graph没有边(G.number_of_edges == 0)，那么dgl自己的GATConv （Graph Attention Convolution会有问题），因此加了个dirty code判断这个情况，如果属实就直接生成一个zero vector（其实按道理应该是一个input feature itself，但是在这个work里面（graph color)反正没有边的图就没有利用价值，所以直接变成0）

DGL: Trying to create tensor with negative dimension

一步一步抽丝剥茧发现：

当pytorch的unique函数为gpu版本时，最后结果居然是一个dim=-251的vec（但问题在哪并不知道）。

奇怪的是单独写一个py文件用gpu调用这个函数并没有问题，同样的input同样的设备只有放到这个project里面才会return负数dim的vec。

暂时解决方法，将调用这个函数的func(in /dgl/core.py)改到cpu上：

1 2	unique_val = F.asnumpy(F.unique(sorted_val)) unique_val = F.asnumpy(F.unique(sorted_val.cpu()))

NEWAY

PyTorch implementation中的挫折

PyTorch implement Color-GCN中的挫折

Dropout & Long

Cuda device

Floating point exception (core dumped)

DGL: Trying to create tensor with negative dimension