Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

topk regression in v1.5 #15703

@leezu

Description

@leezu

Description

https://github.com/dmlc/gluon-nlp/blob/v0.7.1/scripts/word_embeddings/evaluate_pretrained.py stopped working with MXNet v1.5 due to out of memory errors. While the script runs fine on 16GB GPU memory (p3.2xlarge) with MXNet v1.4 it runs out of GPU memory even with 32GPU memory (p3dn.24xlarge) with MXNet v1.5.

Environment info (Required)

----------Python Info----------
Version      : 3.7.3
Compiler     : GCC 5.4.0 20160609
Build        : ('default', 'Jun 13 2019 13:24:27')
Arch         : ('64bit', 'ELF')
------------Pip Info-----------
Version      : 19.0.3
Directory    : /home/ubuntu/.pyenv/versions/3.7.3/lib/python3.7/site-packages/pip
----------MXNet Info-----------
Version      : 1.5.0
Directory    : /home/ubuntu/.local/lib/python3.7/site-packages/mxnet
Commit Hash   : 75a9e187d00a8b7ebc71412a02ed0e3ae489d91f
Library      : ['/home/ubuntu/.local/lib/python3.7/site-packages/mxnet/libmxnet.so']
Build features:
✔ CUDA
✔ CUDNN
✔ NCCL
✔ CUDA_RTC
✖ TENSORRT
✔ CPU_SSE
✔ CPU_SSE2
✔ CPU_SSE3
✔ CPU_SSE4_1
✔ CPU_SSE4_2
✖ CPU_SSE4A
✔ CPU_AVX
✖ CPU_AVX2
✖ OPENMP
✖ SSE
✔ F16C
✖ JEMALLOC
✖ BLAS_OPEN
✖ BLAS_ATLAS
✖ BLAS_MKL
✖ BLAS_APPLE
✔ LAPACK
✖ MKLDNN
✔ OPENCV
✖ CAFFE
✖ PROFILER
✔ DIST_KVSTORE
✖ CXX14
✖ INT64_TENSOR_SIZE
✔ SIGNAL_HANDLER
✖ DEBUG
----------System Info----------
Platform     : Linux-4.4.0-1088-aws-x86_64-with-debian-stretch-sid
system       : Linux
node         : ip-172-31-31-153
release      : 4.4.0-1088-aws
version      : #99-Ubuntu SMP Thu Jul 4 14:25:53 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping:              1
CPU MHz:               2699.535
CPU max MHz:           3000.0000
CPU min MHz:           1200.0000
BogoMIPS:              4600.08
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              46080K
NUMA node0 CPU(s):     0-3
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0023 sec, LOAD: 0.4837 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.1444 sec, LOAD: 0.3984 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.2908 sec, LOAD: 0.4350 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0080 sec, LOAD: 0.1284 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0034 sec, LOAD: 0.2064 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0067 sec, LOAD: 0.0290 sec.

Minimum reproducible example

import argparse
import mxnet as mx


class ExampleBlock(mx.gluon.HybridBlock):
    def __init__(self, idx_to_vec, **kwargs):
        super().__init__(**kwargs)

        self.k = 1
        self.eps = 1E-10

        self._vocab_size, self._embed_size = idx_to_vec.shape

        idx_to_vec = mx.nd.L2Normalization(idx_to_vec, eps=self.eps)
        with self.name_scope():
            self.weight = self.params.get_constant('weight', idx_to_vec)

    def hybrid_forward(self, F, words1, words2, words3, weight):  # pylint: disable=arguments-differ
        words123 = F.concat(words1, words2, words3, dim=0)
        embeddings_words123 = F.Embedding(words123, weight, input_dim=self._vocab_size,
                                          output_dim=self._embed_size)
        similarities = F.FullyConnected(embeddings_words123, weight, no_bias=True,
                                        num_hidden=self._vocab_size, flatten=False)
        # Map cosine similarities to [0, 1]
        similarities = (similarities + 1) / 2

        sim_w1w4, sim_w2w4, sim_w3w4 = F.split(similarities, num_outputs=3, axis=0)

        sim = (sim_w2w4 * sim_w3w4) / (sim_w1w4 + self.eps)

        for words in [words1, words2, words3]:
            sim = sim * F.one_hot(words, self.weight.shape[0], 0, 1)

        pred_idxs = F.topk(sim, k=self.k)
        return pred_idxs


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--batch-size', type=int, default=1024)
    args = parser.parse_args()

    ctx = mx.gpu(0)
    idx_to_vec = mx.nd.zeros(shape=(111066, 300))
    block = ExampleBlock(idx_to_vec)
    block.initialize(ctx=ctx)
    block.hybridize()
    words = [mx.nd.zeros((args.batch_size, ), ctx=ctx) for i in range(3)]
    block(*words)
    mx.nd.waitall()

Alternatively

  • git clone https://github.com/dmlc/gluon-nlp
  • cd gluon-nlp/
  • git checkout v0.7.1
  • python3 ./scripts/word_embeddings/evaluate_pretrained.py --embedding-name fasttext --embedding-source wiki.simple --gpu 0 --similarity-datasets --eval-batch-size 1024

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions