Tvm broadcast backward by hzfan · Pull Request #15938 · apache/mxnet

hzfan · 2019-08-18T15:54:59Z

Description

Use tvm to implement vadd backward with broadcast.

Changes

add vadd backward

Comments

As vadd is not a Numpy op, 0-dim and 0-size is not supported.
For now, I implemented infra-level things on op-level, like
- dispatch of different req
- dispatch of backward

I think code for these may be further reused in the future. It'll be great if we can have a consistent interface for tvm op and hide the dispatch of things like req and backward.

Thank @yzhliu and @junrushao1994 for the brilliant "compressed bit string" idea.

junrushao · 2019-08-18T19:24:27Z

CC @tqchen if you have bandwidth

yzhliu · 2019-08-20T08:58:37Z

contrib/tvmop/basic/ufunc.py

+    return b, c
+
+
+def reduce_axes(X, axes, reducer):


can we add some comments to elaborate the idea? e.g., meaning of axes. also can we move it to somewhere else so that other operators can reuse?

Yes. Added in ufunc.py

yzhliu · 2019-08-20T09:06:17Z

contrib/tvmop/basic/ufunc.py

    return s, [A, B, C]
+
+
+def assign_by_req(a, req):


move to sth like common.py?

Shall we use the existing contrib/tvmop/utils.py or create a contrib/tvmop/basic/common.py?

utils.py is fine

yzhliu · 2019-08-20T09:06:49Z

src/operator/contrib/tvmop/ufunc.cc

+    funcname += "req_";
+    MXNET_ASSIGN_REQ_SWITCH(req[k], req_type, {
+      if (req_type == kWriteTo) {
+                funcname += "kWriteTo";


yzhliu · 2019-08-20T09:10:44Z

src/operator/contrib/tvmop/ufunc.cc

+    // dispatch by backward
+    std::vector<int> ov, iv;
+    const TBlob& ograd = inputs[0], igrad = outputs[k];
+    bool flag = ograd.size(0) != igrad.size(0);


better to use int and explicitly assign the value.

What about expand it into a if-else?

sounds good

yzhliu · 2019-08-20T09:12:36Z

src/operator/contrib/tvmop/ufunc.cc

+    }
+    TShape oshape(ov.begin(), ov.end()), ishape(iv.begin(), iv.end());
+    TBlob ograd_tvm(ograd.reshape(oshape).dltensor());
+    TBlob igrad_tvm(igrad.reshape(ishape).dltensor());


please add some comments to elaborate the ideas.

Added in ufunc.py

reminisce · 2019-08-20T20:45:36Z

src/operator/contrib/tvmop/ufunc.cc

+    std::vector<int> ov, iv;
+    const TBlob& ograd = inputs[0], igrad = outputs[k];
+    bool flag = ograd.size(0) != igrad.size(0);
+    for (int i = 0; i < ndim; ++i) {


If my understanding is correct, there seems to be an assumption that ograd.ndim = igrad.ndim, which is not necessarily true. I think you need to prepend axes before igrad if igrad.ndim < ograd.ndim and then use the logic here.

Yes, igrad.ndim = ograd.ndim is assumed.

@yzhliu suggests padding the input to 5-dim, which is the largest possible dim supported by this op. The padding will 1) reduce the number of kernels (by a factor of 5) 2) handle the igrad.ndim < ograd.ndim issue. But there may be loss in performance.

I think prepending axes before igrad to make it ograd.dim requires more kernels, but the performance is better. It is a tradeoff.

Please correct me if my understanding is wrong, but don't you still need kernels generated for ndims < 5 since you will collapse consecutive dimensions where reduction is performed? For example, given a 5d shape (2, 3, 4, 5, 6), and perform reduction on axis=(1, 2), the tblob will be first reshaped into (2, 12, 30), and then reduce on axis=1. In this case, do you need a kernel generated for 3D shapes?

I think we can pad the shape after dimension collapse. In this case, the tblob will be reshaped into (2, 12, 30, 1, 1) and then reduce on axis=[1, 3].

I see. I am in favor of the approach with less kernels generated. We can revisit the performance concern if that turns out to be an issue.

I pushed a new version, where the inputs and outputs are padded to 5 dim.

hzfan requested a review from szha as a code owner August 18, 2019 15:54

hzfan force-pushed the bc_pr branch from 3eded26 to 53dd8cd Compare August 18, 2019 16:07

haojin2 self-assigned this Aug 19, 2019

haojin2 added the Numpy label Aug 19, 2019

hzfan force-pushed the bc_pr branch 3 times, most recently from 9fc3389 to ad97a7e Compare August 19, 2019 09:22

yzhliu reviewed Aug 20, 2019

View reviewed changes

reminisce reviewed Aug 20, 2019

View reviewed changes

Fan added 3 commits August 21, 2019 13:52

tvm broadcast backward

ef53f27

dispatch by req

f373aa8

pad for broadcast to a larger dim

14351d9

hzfan force-pushed the bc_pr branch from ad97a7e to 14351d9 Compare August 21, 2019 06:40

yzhliu approved these changes Aug 22, 2019

View reviewed changes

yzhliu merged commit 9023256 into apache:master Aug 22, 2019

Conversation

hzfan commented Aug 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Comments

Uh oh!

junrushao commented Aug 18, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hzfan Aug 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hzfan commented Aug 18, 2019 •

edited

Loading

hzfan Aug 21, 2019 •

edited

Loading