fix race when temp space is used in copy & fix instance overwrite in g2c#8867
fix race when temp space is used in copy & fix instance overwrite in g2c#8867piiswrong merged 5 commits intoapache:masterfrom ZiyueHuang:race_and_g2c
Conversation
|
https://github.com/apache/incubator-mxnet/blob/master/src/kvstore/comm.h#L187, |
| # construct the module | ||
| # map the ctx_group attribute to the context assignment | ||
| group2ctxs={'dev1':mx.cpu(), 'dev2':[mx.gpu(i) for i in range(num_gpus)]} | ||
| group2ctxs={'dev1':[mx.cpu()]*num_gpus, 'dev2':[mx.gpu(i) for i in range(num_gpus)]} |
There was a problem hiding this comment.
This change is just for better understandability.
src/ndarray/ndarray.cc
Outdated
| template<typename from_xpu, typename to_xpu> | ||
| void CopyFromToImpl(const NDArray& from, const NDArray& to, RunContext rctx) { | ||
| void CopyFromToImpl(const NDArray& from, const NDArray& to, | ||
| RunContext rctx, std::vector<Resource> requested) { |
There was a problem hiding this comment.
const reference for the vector?
src/ndarray/ndarray.cc
Outdated
| // important: callback must always capture by value | ||
| int a = from.ctx().dev_mask(); | ||
| const auto from_ctx = from.ctx(); | ||
| int a = from_ctx.dev_mask(); |
There was a problem hiding this comment.
Nit : const a and const b
There was a problem hiding this comment.
Please avoid using auto for simple types
| std::vector<Engine::VarHandle> mutable_vars(1, to.var()); | ||
|
|
||
| std::vector<Resource> requested; | ||
| if (a == gpu::kDevMask && from_stype != to_stype) { |
There was a problem hiding this comment.
Accordding to original codes,
- std::vector<Resource> requested;
- if (is_same<from_xpu, mshadow::gpu>::value && from_stype != to_stype) {
- requested.push_back(ResourceManager::Get()->Request(from_ctx,
- ResourceRequest(ResourceRequest::kTempSpace)));
- }
Seems that whether temp space is used is irrelevant with the context of b ?
There was a problem hiding this comment.
Oh right. No need to request temp space if cast_storage happens on CPU.
tests/python/unittest/test_module.py
Outdated
| assert(np.all(mod1_input_grads[1].asnumpy() == mod2_input_grads[1].asnumpy())) | ||
|
|
||
| check_module_ctx_group([mx.cpu(0)], {'dev1': mx.cpu(1), 'dev2': mx.cpu(2)}) | ||
| check_module_ctx_group([mx.cpu(0)], {'dev1': mx.cpu(1), 'dev2': mx.cpu(2)}, [mx.cpu(1), mx.cpu(2)]) |
There was a problem hiding this comment.
Nit: I think explicitly mentioning optional arg names (grad_ctxs) when passing optional args is a good practice since API may change in the future
…g2c (apache#8867) * fix race when temp space is used in copy * fix instance overwrite in g2c * example of g2c * address comments
…g2c (apache#8867) * fix race when temp space is used in copy * fix instance overwrite in g2c * example of g2c * address comments
…g2c (apache#8867) * fix race when temp space is used in copy * fix instance overwrite in g2c * example of g2c * address comments
…g2c (apache#8867) * fix race when temp space is used in copy * fix instance overwrite in g2c * example of g2c * address comments
Description
var of temp space should be in mutable_vars in engine.
[{}] * ctx_lenis actually one dict.cc @eric-haibin-lin
Checklist
Essentials
make lint)Changes
Comments