Optimize transpose operator with MKL-DNN by TaoLv · Pull Request #14545 · apache/mxnet

TaoLv · 2019-03-27T15:01:30Z

Description

Take shapes from GluonNLP BERT base model as an example:

shapes = [(64, 512, 1024), (32, 128, 12, 64), (32, 12, 128, 64)]
axes = [(0, 2, 1), (0, 2, 1, 3), (0, 2, 1 ,3)]

for idx, sh in enumerate(shapes):
    axis = axes[idx]
    a = np.random.rand(*sh)
    x = mx.nd.array(a)

    tic = time.time()
    for i in range(100):
        y = mx.nd.transpose(x, axis)
        y.wait_to_read()

    toc = time.time()
    print("transpose %s to %s, %f " %(sh, axis, (toc-tic)))
    b = np.transpose(a, axis)
    np.allclose(b, y.asnumpy())

Before optimization:

transpose (64, 512, 1024) to (0, 2, 1), 1.228018
transpose (32, 128, 12, 64) to (0, 2, 1, 3), 0.069434
transpose (32, 12, 128, 64) to (0, 2, 1, 3), 0.065088

After optimization:

transpose (64, 512, 1024) to (0, 2, 1), 0.7996
transpose (32, 128, 12, 64) to (0, 2, 1, 3), 0.0107
transpose (32, 12, 128, 64) to (0, 2, 1, 3), 0.0069

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

TaoLv · 2019-03-27T15:02:01Z

@pengzhao-intel @eric-haibin-lin

abhinavs95 · 2019-03-27T17:22:09Z

@mxnet-label-bot add [MKLDNN, Backend, pr-awaiting-review]

ZhennanQin · 2019-03-28T00:36:20Z

src/operator/tensor/matrix_op.cc

+
+  CHECK_EQ(inputs.size(), 1U);
+  CHECK_EQ(outputs.size(), 1U);
+  if (SupportMKLDNNTranspose(param, inputs[0])) {


How about move CHECK_EQ(req, kWriteTo) << "Transpose does not support inplace"; here? Then we have opportunity to provide fallback support instead of crash.

I'm afraid there is no way to fallback. The check is copied from the original implementation:
https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/matrix_op-inl.h#L311

I will move the check to here and make the error happens on an early stage.

ZhennanQin · 2019-03-28T00:39:27Z

src/operator/nn/mkldnn/mkldnn_transpose.cc

+
+ public:
+  MKLDNNTransposeForward(const TransposeParam& param,
+                         const OpReqType &req,


Seems req is redundant.

Yes. Will remove that.

ZhennanQin · 2019-03-28T00:44:38Z

src/operator/nn/mkldnn/mkldnn_transpose.cc

+    if (data.IsMKLDNNData()) {
+      this->data_->set_data_handle(data.GetMKLDNNData()->get_data_handle());
+    } else {
+      this->data_->set_data_handle(data.data().dptr<float>());


Seems this code can be reused for other dtype, so we'd better avoid explictly using float here. How about use template dtype or simply use this->data_->set_data_handle(data.GetMKLDNNData()->get_data_handle());? The later one should always work even if data isn't MKLDNN data.

Will change that. Although, almost all MKL-DNN fp operators are restricted by checking data.dtype() == mshadow::kFloat32. We need revisit those checks one day we want to support fp64.

pengzhao-intel · 2019-03-28T01:55:14Z

cc @eric-haibin-lin

…to enable-transpose

pengzhao-intel

Do we need the special test cases for MKL-DNN trasnpose?

pengzhao-intel · 2019-03-29T07:21:18Z

src/operator/nn/mkldnn/mkldnn_transpose.cc

+/*!
+ * \file mkldnn_transpose.cc
+ * \brief
+ * \author


Add your name

pengzhao-intel · 2019-03-30T08:04:10Z

src/operator/nn/mkldnn/mkldnn_transpose.cc

+                            const NDArray &data) {
+  auto data_ndim = data.shape().ndim();
+
+  if (data_ndim > 4 || data.dtype() != mshadow::kFloat32)


does transpose work for INT8?

It should work but it's not tested and verified. So here I limited the dimensionality and data type just like what we did for other MKL-DNN operators. BTW, if we want to use INT8 transpose in a quantized network, probably we need a quantized transpose operator to accept and output an additional scale argument.

pengzhao-intel · 2019-03-30T08:15:00Z

src/operator/nn/mkldnn/mkldnn_transpose.cc

+      dst_fmt.layout_desc.blocking.strides[1][axes[i]] = 1;
+
+      total_stride *= shape[axes[i]];
+    }


Add the explanation of what the logic inside for these index setting.

Add comments for that.

…to enable-transpose

TaoLv · 2019-03-31T15:41:43Z

Do we need the special test cases for MKL-DNN trasnpose?

Normal use cases should be covered by test_operator.py:test_transpose. I added one additional case to test if the input has MKL-DNN internal layout, eg. nchw16C.

pengzhao-intel

LGTM

please also try the below two issues and see if MKLDNN can accelerate

#14496
#14563

ZhennanQin

LGTM.

…to enable-transpose

TaoLv · 2019-04-04T14:30:32Z

@pengzhao-intel It mitigates the performance issue of transpose in #14496 but doesn't help #14563. Possibly #14563 is not caused by transpose regression. @apeforest @fhieber @samskalicky

(py3env) [lvtao@mlt-skx138 mxnet]$ numactl --physcpubind=0-27 --membind=0 python transpose_perf.py
10
50
100
200
500
mxnet version: 1.3.1
--------------------
size: 10
p50: 0.11 ms
p90: 0.11 ms
p99: 0.11 ms
--------------------
size: 50
p50: 0.06 ms
p90: 0.07 ms
p99: 0.07 ms
--------------------
size: 100
p50: 0.18 ms
p90: 0.18 ms
p99: 0.19 ms
--------------------
size: 200
p50: 2.89 ms
p90: 3.86 ms
p99: 4.22 ms
--------------------
size: 500
p50: 102.48 ms
p90: 102.74 ms
p99: 102.85 ms



mxnet version: 1.4.0
--------------------
size: 10
p50: 0.11 ms
p90: 0.12 ms
p99: 0.38 ms
--------------------
size: 50
p50: 0.09 ms
p90: 0.09 ms
p99: 0.10 ms
--------------------
size: 100
p50: 0.89 ms
p90: 0.93 ms
p99: 0.95 ms
--------------------
size: 200
p50: 5.31 ms
p90: 6.03 ms
p99: 6.52 ms
--------------------
size: 500
p50: 186.42 ms
p90: 186.93 ms
p99: 187.04 ms


(py3env) [lvtao@mlt-skx138 mxnet]$ numactl --physcpubind=0-27 --membind=0 python transpose_perf.py
10
50
100
200
500
mxnet version: 1.5.0 (this PR)
--------------------
size: 10
p50: 0.03 ms
p90: 0.03 ms
p99: 0.03 ms
--------------------
size: 50
p50: 0.08 ms
p90: 0.08 ms
p99: 0.09 ms
--------------------
size: 100
p50: 0.19 ms
p90: 0.33 ms
p99: 0.34 ms
--------------------
size: 200
p50: 2.44 ms
p90: 3.33 ms
p99: 3.50 ms
--------------------
size: 500
p50: 95.93 ms
p90: 96.31 ms
p99: 97.07 ms

pengzhao-intel · 2019-04-04T14:40:03Z

It's really good!

Regarding #14563, does the transpose run into MKLDNN?
I think it's enough to check the time of transpose w/ and w/o this PR.

samskalicky

LGTM, agree with @TaoLv on the summary of the related issues above.

azai91 · 2019-04-06T18:08:15Z

@TaoLv can you retrigger the build? one test is failing. seems to be a flaky test as that test is failing on master as well

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/master/499/pipeline

…to enable-transpose

…-mxnet into enable-transpose

…to enable-transpose

pengzhao-intel · 2019-04-10T22:41:20Z

very great improvement. Merging now.

* add mkldnn transpose * general transpose * support mkldnn format * fix lint * address comments * add unit test * add comments * retrigger CI

TaoLv added 4 commits March 20, 2019 15:29

add mkldnn transpose

fe6d085

general transpose

1f69da2

support mkldnn format

619ae93

fix lint

5193eae

marcoabreu added Backend Issues related to the backend of MXNet MKLDNN pr-awaiting-review PR is waiting for code review labels Mar 27, 2019

ZhennanQin reviewed Mar 28, 2019

View reviewed changes

TaoLv added 2 commits March 28, 2019 14:40

address comments

bcbb971

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

ecbd900

…to enable-transpose

pengzhao-intel reviewed Mar 30, 2019

View reviewed changes

TaoLv added 3 commits March 31, 2019 22:41

add unit test

3a06c9d

add comments

40334b9

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

9fdafc9

…to enable-transpose

pengzhao-intel approved these changes Apr 1, 2019

View reviewed changes

ZhennanQin approved these changes Apr 3, 2019

View reviewed changes

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

47c2547

…to enable-transpose

samskalicky approved these changes Apr 4, 2019

View reviewed changes

This was referenced Apr 5, 2019

performance degradation from 1.3.1 to 1.4.0 #14496

Closed

[Discussion] 1.5.0 Roadmap #14619

Closed

retrigger CI

ef218de

TaoLv added 3 commits April 7, 2019 21:46

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

f43b192

…to enable-transpose

Merge branch 'enable-transpose' of https://github.com/TaoLv/incubator…

f1b8c39

…-mxnet into enable-transpose

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

f46c29b

…to enable-transpose

zachgk mentioned this pull request Apr 9, 2019

CI Failure: R CPU #14649

Closed

TaoLv added 2 commits April 10, 2019 12:39

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

b73cb02

…to enable-transpose

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

173ee25

…to enable-transpose

pengzhao-intel merged commit 2c5d7f7 into apache:master Apr 10, 2019

larroy pushed a commit to larroy/mxnet that referenced this pull request Apr 15, 2019

Optimize transpose operator with MKL-DNN (apache#14545)

2c17660

* add mkldnn transpose * general transpose * support mkldnn format * fix lint * address comments * add unit test * add comments * retrigger CI

Conversation

TaoLv commented Mar 27, 2019

Description

Checklist

Essentials

Changes

Comments

Uh oh!

TaoLv commented Mar 27, 2019

Uh oh!

abhinavs95 commented Mar 27, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pengzhao-intel commented Mar 28, 2019

Uh oh!

pengzhao-intel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TaoLv commented Mar 31, 2019

Uh oh!

pengzhao-intel left a comment

Choose a reason for hiding this comment

Uh oh!

ZhennanQin left a comment

Choose a reason for hiding this comment

Uh oh!

TaoLv commented Apr 4, 2019

Uh oh!

pengzhao-intel commented Apr 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samskalicky left a comment

Choose a reason for hiding this comment

Uh oh!

azai91 commented Apr 6, 2019

Uh oh!

pengzhao-intel commented Apr 10, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

pengzhao-intel commented Apr 4, 2019 •

edited

Loading