Add contrib.rand_zipfian#9747
Conversation
python/mxnet/ndarray/contrib.py
Outdated
| except ImportError: | ||
| pass | ||
|
|
||
| __all__ = [] |
There was a problem hiding this comment.
need to include it in all
| true_classes = true_classes.as_in_context(ctx).astype('float64') | ||
| expected_count_true = ((true_classes + 2.0) / (true_classes + 1.0)).log() / log_range | ||
| # cast sampled classes to fp64 to avoid interget division | ||
| sampled_cls_fp64 = sampled_classes.astype('float64') |
There was a problem hiding this comment.
why is the output always float64?
There was a problem hiding this comment.
64-bit is adopted because this sampler is usually used for extremely large number of classes. Returned samples are always actually always in int64. The fp64 here is used to calculate the probability of a particular classes. (Limited precision of fp32 treat 50M - 1 and 50M - 2 as the same number, yielding nan when taking the log)
python/mxnet/symbol/contrib.py
Outdated
| <NDArray 3x2 @cpu(0)> | ||
| """ | ||
| assert(isinstance(true_classes, Symbol)), "unexpected type %s" % type(true_classes) | ||
| if ctx is None: |
python/mxnet/ndarray/contrib.py
Outdated
| list of NDArrays | ||
| A 1-D `int64` `NDArray` for sampled candidate classes, a 1-D `float64` `NDArray` for \ | ||
| the expected count for true classes, and a 1-D `float64` `NDArray` for the \ | ||
| expected count for sampled classes. |
There was a problem hiding this comment.
We need to write the docstring as:
Returns
--------
samples : NDArray
A 1-D `int64` `NDArray` for sampled candidate classes
exp_count_true : NDArray
...
exp_count_sample : NDArray
...
python/mxnet/ndarray/contrib.py
Outdated
| # cast sampled classes to fp64 to avoid interget division | ||
| sampled_cls_fp64 = sampled_classes.astype('float64') | ||
| expected_count_sampled = ((sampled_cls_fp64 + 2.0) / (sampled_cls_fp64 + 1.0)).log() / log_range | ||
| return [sampled_classes, expected_count_true, expected_count_sampled] |
There was a problem hiding this comment.
No need to return a list here.
python/mxnet/ndarray/contrib.py
Outdated
| __all__ = [] | ||
| __all__ = ["rand_log_uniform"] | ||
|
|
||
| def rand_log_uniform(true_classes, num_sampled, range_max, ctx=None): |
There was a problem hiding this comment.
I think it should not be called as rand_log_uniform because LogUniform has a specific meaning. Should be called something like rand_zipfian, or log_uniform_candidate_sampler like in TF.
python/mxnet/ndarray/contrib.py
Outdated
| sampled_classes = (rand.exp() - 1).astype('int64') % range_max | ||
|
|
||
| true_classes = true_classes.as_in_context(ctx).astype('float64') | ||
| expected_count_true = ((true_classes + 2.0) / (true_classes + 1.0)).log() / log_range |
There was a problem hiding this comment.
I think it should be expected_count_true = ((true_classes + 2.0) / (true_classes + 1.0)).log() / log_range * num_sampled. Otherwise it should be called something like prob_true_class.
There was a problem hiding this comment.
You are right, I should either multiply it by num_sampled or change the name. Will do an update.
| [ 0.12453879] | ||
| <NDArray 1 @cpu(0)> | ||
| >>> exp_count_sample | ||
| [ 0.22629439 0.12453879 0.12453879 0.12453879] |
There was a problem hiding this comment.
The example output looks suspicious as it does not sum up to 1.
There was a problem hiding this comment.
Sorry I've misunderstood the term. It should be correct.
There was a problem hiding this comment.
I feel it's suspicious at first glance because the exp_count of 1 is larger than the exp_count of 3. However, the sampling result show that 3 is much more often then 1. We need to sample multiple times and test if the empirical expectation matches the true expectation.
There was a problem hiding this comment.
It's just a coincident for the first 5 samples. If I sample 50 times, it returns:
1
3
3
3
2
0
0
0
0
1
3
1
1
3
0
2
0
4
0
3
1
3
1
2
2
1
1
2
0
1
0
2
0
0
0
0
0
0
4
1
1
4
0
4
2
0
0
2
1
0
0's = 19
1's = 12
2's = 8
3's = 7
4's = 4
python/mxnet/ndarray/contrib.py
Outdated
| __all__ = ["log_uniform_candidate_sampler"] | ||
|
|
||
| # pylint: disable=line-too-long | ||
| def log_uniform_candidate_sampler(true_classes, num_sampled, range_max, ctx=None): |
There was a problem hiding this comment.
should it go under contrib.random? since other sampling methods in random just have the distribution as name, should we follow the same convention?
There was a problem hiding this comment.
Name changed to rand_zipfian to follow the convention. Extra namespaces such as contrib.random might over-complicate APIs since there are just a few operators in nd.contrib.
python/mxnet/symbol/contrib.py
Outdated
| sampled_cls_fp64 = sampled_classes.astype('float64') | ||
| expected_prob_sampled = ((sampled_cls_fp64 + 2.0) / (sampled_cls_fp64 + 1.0)).log() / log_range | ||
| expected_count_sampled = expected_prob_sampled * num_sampled | ||
| return [sampled_classes, expected_count_true, expected_count_sampled] |
There was a problem hiding this comment.
Good catch, I forgot to update this
* draft * move to contrib * rename op * CR comments * Update contrib.py * Update contrib.py * Update random.py * update example in the doc * update example in symbol doc * CR comments * update op name * update op name * update op name in test * update test * Update contrib.py
* draft * move to contrib * rename op * CR comments * Update contrib.py * Update contrib.py * Update random.py * update example in the doc * update example in symbol doc * CR comments * update op name * update op name * update op name in test * update test * Update contrib.py
* draft * move to contrib * rename op * CR comments * Update contrib.py * Update contrib.py * Update random.py * update example in the doc * update example in symbol doc * CR comments * update op name * update op name * update op name in test * update test * Update contrib.py
* draft * move to contrib * rename op * CR comments * Update contrib.py * Update contrib.py * Update random.py * update example in the doc * update example in symbol doc * CR comments * update op name * update op name * update op name in test * update test * Update contrib.py
Description
Add log-uniform distribution sampler similar to
https://www.tensorflow.org/api_docs/python/tf/nn/log_uniform_candidate_sampler
Note that tf implementation supports sampling w/o replacement, which is not available in this PR.
@sxjscience
Checklist
Essentials
make lint)Changes
Comments