[MXNET-1426] Fix the wrong result of sum, mean, argmin, argmax when inputs contain inf or nan by wkcn · Pull Request #16234 · apache/mxnet

wkcn · 2019-09-22T02:46:53Z

Description

Hi, there.
I fix the wrong result of sum(inf, inf) and mean(inf, inf).

Test Case:

import mxnet as mx
import numpy as np

def test(x):
    print('data', x.asnumpy())
    print('mean/sum', mx.nd.mean(x).asnumpy(), mx.nd.sum(x).asnumpy())
    print('argmin/argmax', mx.nd.argmin(x).asnumpy(), mx.nd.argmax(x).asnumpy())
    print('min/max', mx.nd.min(x).asnumpy(), mx.nd.max(x).asnumpy())
    print('-----')

x = mx.nd.array([np.inf, np.inf, 1])
test(x)

x = mx.nd.array([-np.inf, -np.inf, 1])
test(x)

x = mx.nd.array([np.inf, -np.inf, 1])
test(x)

x = mx.nd.array([np.nan, -np.inf, 1])
test(x)

x = mx.nd.array([np.nan, np.nan])
test(x)

x = mx.nd.array([np.nan, 1])
test(x)

The wrong result in mxnet_mkl-1.6.0b20191015-py2.py3-none-manylinux1_x86_64

data [inf inf  1.]
mean/sum [nan] [nan]
argmin/argmax [2.] [0.]
min/max [1.] [inf]
-----
data [-inf -inf   1.]
mean/sum [nan] [nan]
argmin/argmax [0.] [2.]
min/max [-inf] [1.]
-----
data [ inf -inf   1.]
mean/sum [nan] [nan]
argmin/argmax [1.] [0.]
min/max [-inf] [inf]
-----
data [ nan -inf   1.]
mean/sum [nan] [nan]
argmin/argmax [1.] [2.]
min/max [-inf] [1.]
-----
data [nan nan]
mean/sum [nan] [nan]
argmin/argmax [0.] [0.]
min/max [inf] [-inf]
-----
data [nan  1.]
mean/sum [nan] [nan]
argmin/argmax [1.] [1.]
min/max [1.] [1.]

The correct result in this PR:

data [inf inf  1.]
mean/sum [inf] [inf]
argmin/argmax [2.] [0.]
min/max [1.] [inf]
-----
data [-inf -inf   1.]
mean/sum [-inf] [-inf]
argmin/argmax [0.] [2.]
min/max [-inf] [1.]
-----
data [ inf -inf   1.]
mean/sum [nan] [nan]
argmin/argmax [1.] [0.]
min/max [-inf] [inf]
-----
data [ nan -inf   1.]
mean/sum [nan] [nan]
argmin/argmax [0.] [0.]
min/max [nan] [nan]
-----
data [nan nan]
mean/sum [nan] [nan]
argmin/argmax [0.] [0.]
min/max [nan] [nan]
-----
data [nan  1.]
mean/sum [nan] [nan]
argmin/argmax [0.] [0.]
min/max [nan] [nan]

If we modify the test function,

def test(x):
    x = x.asnumpy()
    print('data', x)
    print('mean/sum', np.mean(x), np.sum(x))
    print('argmin/argmax', np.argmin(x), np.argmax(x))
    print('min/max', np.min(x), np.max(x))
    print('-----')

Here is the result of NumPy.

data [inf inf  1.]
mean/sum inf inf
argmin/argmax 2 0
min/max 1.0 inf
-----
data [-inf -inf   1.]
mean/sum -inf -inf
argmin/argmax 0 2
min/max -inf 1.0
-----
data [ inf -inf   1.]
mean/sum nan nan
argmin/argmax 1 0
min/max -inf inf
-----
data [ nan -inf   1.]
mean/sum nan nan
argmin/argmax 0 0
min/max nan nan
-----
data [nan nan]
mean/sum nan nan
argmin/argmax 0 0
min/max nan nan
-----
data [nan  1.]
mean/sum nan nan
argmin/argmax 0 0
min/max nan nan
-----

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Add isnan_typed and isinf_typed in mshadow
Remove isnan_typed in src/operator/mshadow_op.h
Update mshadow/extension/reduce_with_axis.h to support NaN
Add relative python testcases
Update Julia testcase, which is consistent with Julia built-in functions argmin and argmax.

3rdparty/mshadow/mshadow/base.h

julia/test/unittest/ndarray.jl

iblislin

The Julia part looks fine for me.

3rdparty/mshadow/mshadow/extension/reduce_with_axis.h

wkcn · 2019-09-23T02:03:52Z

Hi @marcoabreu @access2rohit , could you please help take a review?
The PR fixes the wrong result of sum, mean, argmin and argmax when inputs contain inf or NaN.
Thank you!

julia/test/unittest/ndarray.jl

tests/python/unittest/test_operator.py

3rdparty/mshadow/mshadow/base.h

julia/test/unittest/ndarray.jl

wkcn · 2019-10-09T13:40:31Z

Hi @reminisce and @haojin2 , could you please help take a review?

This PR makes the following functions consistent with NumPy.

sum, mean, argmin, argmax

Thank you!

wkcn · 2019-10-16T03:30:54Z

Hi @eric-haibin-lin , could you please help take a review?
It is a bug, which outputs a wrong result or an inconsistent result with that of NumPy.

Thank you so much!

eric-haibin-lin · 2019-10-17T04:34:07Z

Would you mind also add what is the result before this fix?

wkcn · 2019-10-17T07:34:52Z

Hi @eric-haibin-lin , I have updated the test result : )

szha · 2019-10-21T04:18:10Z

cc @reminisce

wkcn · 2019-11-10T02:58:42Z

Ping : )

marcoabreu · 2019-11-10T18:26:02Z

3rdparty/mshadow/mshadow/base.h

+  }
+  template<>
+  MSHADOW_XINLINE bool IsNan(volatile mshadow::half::half_t val) {
+    return (val.half_ & 0x7fff) > 0x7c00;


Can you turn these magic values into constants with documentation? While I get 0x7ffff, 0x7c00 for example, looks quite arbitrary.

Hi @marcoabreu , I add two constants MSHADOW_HALF_SIGN_BIT and MSHADOW_HALF_EXPONENT_BITS in 3rdparty/mshadow/mshadow/half.h, and replace these two magic values.

marcoabreu · 2019-11-10T18:28:52Z

tests/python/unittest/test_ndarray.py

                                                         %(ndarray_ret.shape, numpy_ret.shape)
-            err = np.square(ndarray_ret - numpy_ret).mean()
-            assert err < 1E-4
+            if check_dtype:


Could you elaborate why you're introducing so much branching into a test? If the results are inconsistent, we should rather improve the test instead of skipping the checks. I'd love to have more detail

Hi @marcoabreu , here is the explanation.

So much branching
We need to test all reduce operators, like min, max, argmin, argmax, sum, mean when the inputs contain -inf, +inf, nan.

Skipping the checks
I replace the old check with a new one. : )

marcoabreu · 2019-11-10T18:29:30Z

I'll merge after the feedback has been addressed :) Sorry for the delay

…o fix_meansum_nan

wkcn commented Sep 22, 2019

View reviewed changes

3rdparty/mshadow/mshadow/base.h Outdated Show resolved Hide resolved

wkcn commented Sep 22, 2019

View reviewed changes

3rdparty/mshadow/mshadow/base.h Show resolved Hide resolved

wkcn changed the title ~~Fix the wrong result of sum(inf, inf) and mean(inf, inf)~~ [MXNET-1426] Fix the wrong result of sum(inf, inf) and mean(inf, inf) Sep 22, 2019

wkcn requested a review from iblislin as a code owner September 22, 2019 09:10

wkcn changed the title ~~[MXNET-1426] Fix the wrong result of sum(inf, inf) and mean(inf, inf)~~ [MXNET-1426] Fix the wrong result of sum, mean, argmin, argmax when inputs contain inf or nan Sep 22, 2019

wkcn commented Sep 22, 2019

View reviewed changes

julia/test/unittest/ndarray.jl Outdated Show resolved Hide resolved

wkcn added Bug Operator labels Sep 22, 2019

iblislin approved these changes Sep 22, 2019

View reviewed changes

wkcn commented Sep 22, 2019

View reviewed changes

3rdparty/mshadow/mshadow/extension/reduce_with_axis.h Show resolved Hide resolved

wkcn added the pr-awaiting-review PR is waiting for code review label Sep 23, 2019

access2rohit reviewed Sep 23, 2019

View reviewed changes

julia/test/unittest/ndarray.jl Outdated Show resolved Hide resolved

access2rohit reviewed Sep 23, 2019

View reviewed changes

julia/test/unittest/ndarray.jl Outdated Show resolved Hide resolved

access2rohit reviewed Sep 23, 2019

View reviewed changes

julia/test/unittest/ndarray.jl Outdated Show resolved Hide resolved

access2rohit reviewed Sep 23, 2019

View reviewed changes

tests/python/unittest/test_operator.py Outdated Show resolved Hide resolved

access2rohit reviewed Sep 23, 2019

View reviewed changes

3rdparty/mshadow/mshadow/base.h Show resolved Hide resolved

wkcn requested a review from szha as a code owner September 24, 2019 03:10

iblislin reviewed Sep 24, 2019

View reviewed changes

julia/test/unittest/ndarray.jl Outdated Show resolved Hide resolved

access2rohit approved these changes Sep 25, 2019

View reviewed changes

wkcn added 6 commits October 17, 2019 17:13

fix meansum nan

1245499

remove print in testcase

100d9c1

update to avoid assignment

4aa2dc6

update

1d557e0

fix argmin and argmax, update julia unittest

1d79447

update argmin/argmax docs in julia bindings

9b744d5

wkcn added 7 commits October 17, 2019 17:13

debug

66280f1

update

848c57b

update test

4de6bac

fix sum merge

9c3a72c

update testcase

9f2e4fc

update including sign

714951c

fix allclose

6666bd6

wkcn force-pushed the fix_meansum_nan branch from 7e98af9 to 6666bd6 Compare October 17, 2019 09:14

ci

dca0c2a

Merge branch 'master' into fix_meansum_nan

acf48f4

marcoabreu suggested changes Nov 10, 2019

View reviewed changes

wkcn added 3 commits November 12, 2019 09:39

use constants

9183cb3

Merge branch 'fix_meansum_nan' of github.com:wkcn/incubator-mxnet int…

a0c81ec

…o fix_meansum_nan

Merge branch 'master' into fix_meansum_nan

a5a9fb4

marcoabreu approved these changes Nov 12, 2019

View reviewed changes

fix build for isinf and isnan

3b51b79

wkcn added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-awaiting-review PR is waiting for code review labels Nov 12, 2019

wkcn added 2 commits November 12, 2019 14:53

ci

c8761ad

ci

3a2f062

wkcn merged commit 52716de into apache:master Nov 12, 2019

wkcn mentioned this pull request Nov 22, 2019

[Backport][v1.6.x] Fix the wrong result of sum, mean, argmin, argmax when inputs contain inf or nan #16884

Merged

wkcn mentioned this pull request Jan 6, 2020

mxnet.ndarray.from_numpy() throws error for float16 dtype #17218

Open

DickJC123 mentioned this pull request Jan 15, 2020

Fix flakey test_ndarray.py:test_reduce #17312

Merged

5 tasks

wkcn mentioned this pull request Apr 5, 2020

[Bug Fix] Fix the wrong result of mx.np.mean and mx.np.sum when the input contains np.inf #17975

Open

7 tasks

Conversation

wkcn commented Sep 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Essentials

Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iblislin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wkcn commented Sep 23, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wkcn commented Oct 9, 2019

Uh oh!

wkcn commented Oct 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eric-haibin-lin commented Oct 17, 2019

Uh oh!

wkcn commented Oct 17, 2019

Uh oh!

szha commented Oct 21, 2019

Uh oh!

wkcn commented Nov 10, 2019

Uh oh!

marcoabreu Nov 10, 2019

Choose a reason for hiding this comment

Uh oh!

wkcn Nov 12, 2019

Choose a reason for hiding this comment

Uh oh!

marcoabreu Nov 10, 2019

Choose a reason for hiding this comment

Uh oh!

wkcn Nov 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marcoabreu commented Nov 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

wkcn commented Sep 22, 2019 •

edited

Loading

wkcn commented Oct 16, 2019 •

edited

Loading

wkcn Nov 12, 2019 •

edited

Loading

marcoabreu commented Nov 10, 2019 •

edited

Loading