topk regression in v1.5

## Description
https://github.com/dmlc/gluon-nlp/blob/v0.7.1/scripts/word_embeddings/evaluate_pretrained.py stopped working with MXNet v1.5 due to out of memory errors. While the script runs fine on 16GB GPU memory (p3.2xlarge) with MXNet v1.4 it [runs out of GPU memory even with 32GPU memory](http://ci.mxnet.io/blue/rest/organizations/jenkins/pipelines/GluonNLP-py3-gpu-integration/branches/PR-842/runs/11/nodes/82/steps/121/log/?start=0) (p3dn.24xlarge) with MXNet v1.5.

## Environment info (Required)

```
----------Python Info----------
Version      : 3.7.3
Compiler     : GCC 5.4.0 20160609
Build        : ('default', 'Jun 13 2019 13:24:27')
Arch         : ('64bit', 'ELF')
------------Pip Info-----------
Version      : 19.0.3
Directory    : /home/ubuntu/.pyenv/versions/3.7.3/lib/python3.7/site-packages/pip
----------MXNet Info-----------
Version      : 1.5.0
Directory    : /home/ubuntu/.local/lib/python3.7/site-packages/mxnet
Commit Hash   : 75a9e187d00a8b7ebc71412a02ed0e3ae489d91f
Library      : ['/home/ubuntu/.local/lib/python3.7/site-packages/mxnet/libmxnet.so']
Build features:
✔ CUDA
✔ CUDNN
✔ NCCL
✔ CUDA_RTC
✖ TENSORRT
✔ CPU_SSE
✔ CPU_SSE2
✔ CPU_SSE3
✔ CPU_SSE4_1
✔ CPU_SSE4_2
✖ CPU_SSE4A
✔ CPU_AVX
✖ CPU_AVX2
✖ OPENMP
✖ SSE
✔ F16C
✖ JEMALLOC
✖ BLAS_OPEN
✖ BLAS_ATLAS
✖ BLAS_MKL
✖ BLAS_APPLE
✔ LAPACK
✖ MKLDNN
✔ OPENCV
✖ CAFFE
✖ PROFILER
✔ DIST_KVSTORE
✖ CXX14
✖ INT64_TENSOR_SIZE
✔ SIGNAL_HANDLER
✖ DEBUG
----------System Info----------
Platform     : Linux-4.4.0-1088-aws-x86_64-with-debian-stretch-sid
system       : Linux
node         : ip-172-31-31-153
release      : 4.4.0-1088-aws
version      : #99-Ubuntu SMP Thu Jul 4 14:25:53 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping:              1
CPU MHz:               2699.535
CPU max MHz:           3000.0000
CPU min MHz:           1200.0000
BogoMIPS:              4600.08
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              46080K
NUMA node0 CPU(s):     0-3
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0023 sec, LOAD: 0.4837 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.1444 sec, LOAD: 0.3984 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.2908 sec, LOAD: 0.4350 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0080 sec, LOAD: 0.1284 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0034 sec, LOAD: 0.2064 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0067 sec, LOAD: 0.0290 sec.
```

## Minimum reproducible example
``` python
import argparse
import mxnet as mx


class ExampleBlock(mx.gluon.HybridBlock):
    def __init__(self, idx_to_vec, **kwargs):
        super().__init__(**kwargs)

        self.k = 1
        self.eps = 1E-10

        self._vocab_size, self._embed_size = idx_to_vec.shape

        idx_to_vec = mx.nd.L2Normalization(idx_to_vec, eps=self.eps)
        with self.name_scope():
            self.weight = self.params.get_constant('weight', idx_to_vec)

    def hybrid_forward(self, F, words1, words2, words3, weight):  # pylint: disable=arguments-differ
        words123 = F.concat(words1, words2, words3, dim=0)
        embeddings_words123 = F.Embedding(words123, weight, input_dim=self._vocab_size,
                                          output_dim=self._embed_size)
        similarities = F.FullyConnected(embeddings_words123, weight, no_bias=True,
                                        num_hidden=self._vocab_size, flatten=False)
        # Map cosine similarities to [0, 1]
        similarities = (similarities + 1) / 2

        sim_w1w4, sim_w2w4, sim_w3w4 = F.split(similarities, num_outputs=3, axis=0)

        sim = (sim_w2w4 * sim_w3w4) / (sim_w1w4 + self.eps)

        for words in [words1, words2, words3]:
            sim = sim * F.one_hot(words, self.weight.shape[0], 0, 1)

        pred_idxs = F.topk(sim, k=self.k)
        return pred_idxs


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--batch-size', type=int, default=1024)
    args = parser.parse_args()

    ctx = mx.gpu(0)
    idx_to_vec = mx.nd.zeros(shape=(111066, 300))
    block = ExampleBlock(idx_to_vec)
    block.initialize(ctx=ctx)
    block.hybridize()
    words = [mx.nd.zeros((args.batch_size, ), ctx=ctx) for i in range(3)]
    block(*words)
    mx.nd.waitall()

```

Alternatively
- git clone https://github.com/dmlc/gluon-nlp
- cd gluon-nlp/
- git checkout v0.7.1
- `python3 ./scripts/word_embeddings/evaluate_pretrained.py --embedding-name fasttext --embedding-source wiki.simple --gpu 0 --similarity-datasets --eval-batch-size 1024`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

topk regression in v1.5 #15703

Description

Environment info (Required)

Minimum reproducible example

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

topk regression in v1.5 #15703

Description

Description

Environment info (Required)

Minimum reproducible example

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions