Fix the gradient of gather_nd#9200
Conversation
fix fix fix update only support real_type update update try to fix update fix update revise test fix lint
|
|
||
| // Overload atomicAdd to work for signed int64 on all architectures | ||
| static inline __device__ void atomicAdd(int64_t *address, int64_t val) { | ||
| atomicAdd(reinterpret_cast<unsigned long long*>(address), static_cast<unsigned long long>(val)); // NOLINT |
There was a problem hiding this comment.
are you sure this works for negative value?
There was a problem hiding this comment.
It should be safe if CUDA uses 2's complement to implement the signed long long.
There was a problem hiding this comment.
| data = mx.nd.array([2123162361283621, -31231236374787, | ||
| -112372937128970, -1378278798172378], dtype=dtype) | ||
| idx = mx.nd.array([[0, 0, 0, 0]], dtype='int32') | ||
| assert (mx.nd.scatter_nd_acc(data, idx, shape=(1,)).asnumpy()[0] == data.asnumpy().sum()) |
There was a problem hiding this comment.
@piiswrong I've added another test case for the signed int64 case.
| const DType* data, | ||
| const IType* indices, | ||
| mshadow::Stream<cpu> *s) { | ||
| for (int i = 0; i < N; i++) { |
There was a problem hiding this comment.
This is single-threaded. Can we use #pragma omp critical or #pragma omp atomic for the cpu kernel?
There was a problem hiding this comment.
Yes, we can use openmp. Let me have a try.
src/operator/tensor/indexing_op.cu
Outdated
| NNVM_REGISTER_OP(scatter_nd) | ||
| .set_attr<FCompute>("FCompute<gpu>", ScatterNDForward<gpu>); | ||
|
|
||
| NNVM_REGISTER_OP(scatter_nd_acc) |
There was a problem hiding this comment.
The string acc looks ambiguous. I thought it standed for accurate in the beginning, but realized that it means accumulate later. It's named scatter_nd_add in TF, as there are also scatter_nd_sub, scatter_nd_mul, and scatter_nd_div. Shall we also call it scatter_nd_add to be precise?
There was a problem hiding this comment.
There is a slight difference between scatter_nd_add and scatter_nd_acc. In scatter_nd_add, the results are added to another array. While in scatter_nd_acc, the values are added to a all-zero array. The number of arguments are different for these two OPs.
src/operator/tensor/indexing_op.cu
Outdated
| mshadow::Stream<gpu> *s) { | ||
| using namespace mshadow::cuda; | ||
| int ngrid = std::min(kMaxGridNum, (N + kBaseThreadNum - 1) / kBaseThreadNum); | ||
| ScatterNDAccForwardImplKernel |
There was a problem hiding this comment.
Does Kernel::Launch not fit here?
There was a problem hiding this comment.
It does not fit due to the atomicAdd.
There was a problem hiding this comment.
Why does atomicAdd prevent Kernel::Launch from being used?
There was a problem hiding this comment.
Okay, I can still use launch, but can only use it for GPU.
src/operator/tensor/indexing_op.cu
Outdated
|
|
||
| template<typename DType, typename IType> | ||
| __global__ void ScatterNDAccForwardImplKernel(int N, int M, int K, | ||
| const mshadow::Shape<10> strides, |
| } | ||
| for (int j = 0; j < K; ++j) { | ||
| #pragma omp atomic | ||
| out[offset + j] += data[i * K + j]; |
There was a problem hiding this comment.
You can consolidate this with the gpu kernel by using #if __CUDA__ #elsein the header file since this line is the only difference between cpu and gpu kernels. Then in the FCompute function, you can use Kernel::Launch for both cpu and gpu kernels. That would make the implementation less verbose.
There was a problem hiding this comment.
@reminisce I've specialized the implementation of half_t and now it passes the test
|
It's very strange. The CI test fails on all windows machines. |
|
@reminisce I find I cannot use omp atomic. Also, using omp critic will not have any parallelism. I've reverted back to the original version. |
|
What is the error of using omp atomic? |
|
|
|
@reminisce I think it's caused by |
|
I see. Is this a runtime error. If it's only float16 not supported, I suggest we'd better use |
|
@piiswrong @reminisce Can it be merged? |
|
|
||
| assert (mx.nd.scatter_nd(data, idx, shape=(2, 2)).asnumpy() == [[0, 0], [2, 3]]).all() | ||
| assert (mx.nd.scatter_nd_acc(y, idx, shape=data.shape).asnumpy() == data.grad.asnumpy()).all() | ||
| for dtype in ['int32', 'int64', 'float16', 'float32', 'float64']: |
There was a problem hiding this comment.
It seems that only int64 has been tested for scatter_nd_acc on the same index case. Could you confirm?
| data_npy = np.random.randint(0, 10, (100,)) | ||
| data = mx.nd.array(data_npy, dtype=dtype) | ||
| idx = mx.nd.zeros(shape=(1, 100), dtype='int32') | ||
| assert (mx.nd.scatter_nd_acc(data, idx, shape=(1,)).asscalar() == data_npy.sum()) |
There was a problem hiding this comment.
@reminisce I've added another test for all the dtypes.
|
Should I merge it in? |
|
rename to _backward_gather_nd |
|
@piiswrong I've renamed accordingly. |
src/operator/tensor/indexing_op.cc
Outdated
|
|
||
| all other entries in output are 0. | ||
|
|
||
| WARNING!!! If the indices have duplicates, the result will be non-deterministic and |
There was a problem hiding this comment.
This looks ugly. Standard warning message is
.. Warning:: xxx
* try to implement scatter_nd_acc fix fix fix update only support real_type update update try to fix update fix update revise test fix lint * fix * mark line as no lint * fix test * revise test * fix test case * revise * remove openmp * update * update * update * update test * Revert "update test" This reverts commit 3eb3ac6. * Revert "update" This reverts commit a28fa53. * Revert "update" This reverts commit e99ffd0. * Revert "update" This reverts commit 399ba02. * add atomic and specialize the behavior of half_t * use "!" instead of not * add test * fix test * fix test * fix test * rename to backward_gather_nd * fix * fix * fix doc
* try to implement scatter_nd_acc fix fix fix update only support real_type update update try to fix update fix update revise test fix lint * fix * mark line as no lint * fix test * revise test * fix test case * revise * remove openmp * update * update * update * update test * Revert "update test" This reverts commit 3eb3ac6. * Revert "update" This reverts commit a28fa53. * Revert "update" This reverts commit e99ffd0. * Revert "update" This reverts commit 399ba02. * add atomic and specialize the behavior of half_t * use "!" instead of not * add test * fix test * fix test * fix test * rename to backward_gather_nd * fix * fix * fix doc
* try to implement scatter_nd_acc fix fix fix update only support real_type update update try to fix update fix update revise test fix lint * fix * mark line as no lint * fix test * revise test * fix test case * revise * remove openmp * update * update * update * update test * Revert "update test" This reverts commit 3eb3ac6. * Revert "update" This reverts commit a28fa53. * Revert "update" This reverts commit e99ffd0. * Revert "update" This reverts commit 399ba02. * add atomic and specialize the behavior of half_t * use "!" instead of not * add test * fix test * fix test * fix test * rename to backward_gather_nd * fix * fix * fix doc
* try to implement scatter_nd_acc fix fix fix update only support real_type update update try to fix update fix update revise test fix lint * fix * mark line as no lint * fix test * revise test * fix test case * revise * remove openmp * update * update * update * update test * Revert "update test" This reverts commit 3eb3ac6. * Revert "update" This reverts commit a28fa53. * Revert "update" This reverts commit e99ffd0. * Revert "update" This reverts commit 399ba02. * add atomic and specialize the behavior of half_t * use "!" instead of not * add test * fix test * fix test * fix test * rename to backward_gather_nd * fix * fix * fix doc
Description
Add _backward_gather_nd, which accumulates the value when the indices are same. Should solve #9172
Checklist
make lint)Changes
Comments
I use atomicAdd to implement the operator. The current CPU implementation does not used openmp. Also, int8 and uint8 are not supported.