[MXNET-854] SVRG Optimization in Python Module API#12376
[MXNET-854] SVRG Optimization in Python Module API#12376sandeep-krishnamurthy merged 1 commit intoapache:masterfrom
Conversation
|
Thanks @StephanieYuan - Welcome to MXNet community! @piiswrong @eric-haibin-lin @anirudhacharya @Roshrini @vandanavk - You will be interested to have a look at this. |
sandeep-krishnamurthy
left a comment
There was a problem hiding this comment.
Few high level comments for now. Will dive deep by building this change and playing around with MNIST.
Thanks for doing the benchmarks and adding the results. Looks promising and very useful.
It will be very helpful, if we can benchmark on MNIST as depicted in the paper, and see if we can achieve the results in paper, to verify correctness of the implementation.
| Examples | ||
| -------- | ||
| >>> # An example of declaring and using SVRGModule. | ||
| >>> mod = mod = SVRGModule(symbol=lro, data_names=['data'], label_names=['lin_reg_label'], update_freq=2) |
There was a problem hiding this comment.
Please correct this example.
| if isinstance(update_freq, int): | ||
| self.update_freq = update_freq | ||
| else: | ||
| raise TypeError("update_freq must be an integer") |
There was a problem hiding this comment.
Can you make error more useful for user to understand the issue. Example: (update_freq in SVRGModule must be an integer. Example: 2. Given update_freq = ', update_freq)
| """Internal function to reset binded state.""" | ||
| super(SVRGModule, self)._reset_bind() | ||
| self._mod_aux._reset_bind() | ||
|
|
| self.logger.info('Epoch[%d] Train-%s=%f', epoch, name, val) | ||
| toc = time.time() | ||
| self.logger.info('Epoch[%d] Time cost=%.3f', epoch, (toc - tic)) | ||
| print('Epoch[%d] Time cost=%.3f', epoch, eval_metric.get()) |
There was a problem hiding this comment.
please use logger instead of print every where. example self.logger.info
|
|
||
| @mx.optimizer.register | ||
| class SVRGOptimizer(mx.optimizer.Optimizer): | ||
| """SVRGOptimizer is a wrapper class for two optimizers: one for accumulating full gradients and the other |
There was a problem hiding this comment.
This is mainly useful with SGD right? Why allow any optimizer to be passed in here?
There was a problem hiding this comment.
For the previous benchmarks SVRG is used with SGD. Meanwhile I did some exploration of using NAG + SVRG and Adam + SVRG. I think it will be valuable to benchmark those optimizers with SVRG too.
| @@ -0,0 +1,84 @@ | |||
| # Licensed to the Apache Software Foundation (ASF) under one | |||
There was a problem hiding this comment.
tests should go tests. May be like below?
- tests/python/train/test_contrib_svrg.py : Full end to end train and testing. See test_mlp.py as an example. Please, note, it should be faster (< 2 min at max?). If it is longer than that, we need to move it to nightly test.
- tests/python/unittest/test_contrib_svrgmodule.py / ..._svrgoptimizer.py etc..
| mod.forward_backward(data_batch=batch) | ||
| mod.update() | ||
| mod.update_metric(metrics, batch.label) | ||
| print('Epoch[%d] Time cost=%.3f', e, metrics.get()) |
| :return: an instance of mx.io.NDArrayIter | ||
| :return: an instance of mx.mod.svrgmodule for performing SVRG optimization | ||
| """ | ||
| mx.random.seed(42) |
There was a problem hiding this comment.
- Please do not use fixed seed. You can see other tests under tests/
- Also, what are we asserting here?
| return di, mod | ||
|
|
||
| # run as a script | ||
| if __name__ == "__main__": |
There was a problem hiding this comment.
should it be? Please see other tests.
if __name__ == '__main__':
import nose
nose.runmodule()
fc59615 to
9ecc673
Compare
|
Moved svrg_optimization to python/contrib package. |
495be3b to
a107e3b
Compare
|
@StephanieYuan The original version of SVRG could be time-consuming due to the computation of the full gradient at the beginning of each epoch. Could you also include the cheap version in your implementation: |
vandanavk
left a comment
There was a problem hiding this comment.
In general, please add documentation for the input params for functions and a description of return variables wherever missing.
Please execute pylint on these files using ci/other/pylintrc in incubator-mxnet folder.
Something like this
pylint --rcfile=ci/other/pylintrc --ignore-patterns="..so$$,..dll$$,..dylib$$" python/mxnet/contrib/svrg_optimization/.py example/svrg_module/.py tests/python/unittest/test_contrib_svrg
| parser.add_argument('-b', dest="batch_size", help="define the batch size for training", type=int, | ||
| default=100, required=False) | ||
| parser.add_argument('-m', dest='metrics', help="create eval metric", type=str, required=False) | ||
| parser.add_argument('--gpus', type=str, help='list of gpus to run, e.g. 0 or 0,2,5. empty means using cpu') |
There was a problem hiding this comment.
Default is mx.cpu() as defined in line 35.
a107e3b to
b712796
Compare
|
@xcgoner Thank you for the pointer! I'd like to try out the improvements in the later iterations of this project. |
There was a problem hiding this comment.
Can you update the API docs here - module.md and optimization.md with references to the SVRG module.
|
|
||
| @mx.optimizer.register | ||
| class SVRGOptimizer(mx.optimizer.Optimizer): | ||
| """SVRGOptimizer is a wrapper class for two optimizers: self.aux_opt for accumulating full gradients and |
There was a problem hiding this comment.
Please write the doc string in the following format to maintain uniformity - https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/optimizer.py#L445-L488
| import numpy as np | ||
|
|
||
|
|
||
| def set_up(): |
There was a problem hiding this comment.
SVRGModule lacks test coverage. Can you add tests for module.save_checkpoint, module.load, module.forward, module.predict.
| ## SVRG Optimization in Python Module | ||
|
|
||
| ### Problem Description | ||
| SVRG stands for Stochastic Variance Reduced Gradient, which was first introduced in the paper _Accelerating Stochastic |
There was a problem hiding this comment.
This README should be removed from here and the SVRG description and characteristics should go into the documentation of the SVRG optimizer.
| * test_svrg_module.py: unittests for SVRGModule API | ||
| * test_svrg_optimizer.py: unittests for SVRGOptimizer API | ||
|
|
||
| ### Examples: |
There was a problem hiding this comment.
add these in the README of the examples folder.
| * examples/svrg_module/example_api_train.py: a demo script for training a SVRGModule using intermediate level api and high level api | ||
| * examples/svrg_module/example_inference.py: a demo script for SVRGModule inference | ||
|
|
||
| #### Benchmarks: |
There was a problem hiding this comment.
as the benchmark graphs are present in the "examples" folder, move these lines to the README in the examples folder.
|
|
||
| #### Testing: | ||
|
|
||
| Unit Tests: |
There was a problem hiding this comment.
remove these lines about the Unit Tests.
| data_reader.py to download the data. | ||
|
|
||
| This example trains a linear regression model using SVRGModule. Logs of the training results can be found in | ||
| experiments.log |
There was a problem hiding this comment.
where is experiments.log?
There was a problem hiding this comment.
It will be generated when runs the train script.
2f7b40d to
4a2b644
Compare
There was a problem hiding this comment.
Run the following command to get the coverage results nosetests -v --with-coverage tests/python/unittest/test_contrib_svrg_*
mxnet/contrib/svrg_optimization/__init__.py 3 0 100%
mxnet/contrib/svrg_optimization/svrg_module.py 175 92 47%
mxnet/contrib/svrg_optimization/svrg_optimizer.py 38 1 97%
If you notice svrg_module.py coverage is deficient. Please increase the coverage for that file, for example update_full_grads method is not tested and you have overriden the fit method of the base class Module, please add tests for them.
Also I am still not sure if the organization of the documentation for this new module is right. Marking @aaronmarkham to review the doc files.
tests/python/unittest/test_module.py
Outdated
| assert mod.get_outputs()[0].shape == dshape | ||
|
|
||
| hdshape = (3, 4, 7) | ||
|
|
There was a problem hiding this comment.
nit: unnecessary blank line
aaronmarkham
left a comment
There was a problem hiding this comment.
Without trying it out, I just have a couple of suggestions for updates.
To be clear, at a minimum, you need to update api/python/index.md and add this module in the contrib section.
| @@ -0,0 +1,26 @@ | |||
| ## SVRG Module Example | |||
|
|
|||
There was a problem hiding this comment.
Introduce what it is as well. From your other document:
"SVRGModule is an extension to the Module API that implements SVRG (Stochastic Variance Reduced Gradients) optimization ..."
| ## SVRG Module Example | ||
|
|
||
| #### API Usage Example | ||
| SVRGModule provides both high-level and intermediate-level APIs while minimizing the changes with Module API. |
There was a problem hiding this comment.
Link back to the docs. This should link back to this spot - where you should add an entry for this module.
https://github.com/apache/incubator-mxnet/blob/master/docs/api/python/index.md#contrib-package
Since this comes from the example folder and outside of the docs, use a full static link.
| @@ -1,188 +0,0 @@ | |||
| { | |||
There was a problem hiding this comment.
Why delete this file and legacy_ndarray.v0?
There was a problem hiding this comment.
It was done by mistake, I'll add them back when I push for more svrg_module test coverage. Thank you for the heads up!
|
@StephanieYuan Also the way From your design document I understand that SVRGOptimizer cannot be used with Module API which is why you wrote the SVRGModule API which will use SVRGOptimizer under the hood instead of the usual SGD or other optimizers. Can you explain why in greater detail? Also if this is indeed the case, then the SVRGOptimizer class needs to be declared private using a leading underscore in its name, else it will be very misleading to the user who might try to use SVRGOptimizer with the plain vanilla Module API. Also it should be made clear in the doc string for the SRVGOptimizer that this optimizer is to be used with the SVRGModule alone. Also you might want to consider moving the SVRGOptimizer to the |
fa1753c to
bb31363
Compare
|
Great progress here. Thanks everyone.
@anirudhacharya - I would still prefer keeping SVRG optimizer in contrib and allow it to evolve. As mentioned in above comments and design doc, Other combination like Adam+SVRG could also be used but not benchmarked. I would set the default optimizer for SVRGModule to be SVRGOptimizer but not remove the option altogether and make SVRGOptimizer internal. Thoughts? |
4c48130 to
c446dc0
Compare
|
Thanks everyone for the feedback and input! I've rebased and resolved the conflict with master. I have also added some more unittests for SVRGModule especially on verifying SVRG gradients calculations and the test coverage is now: @sandeep-krishnamurthy I'm working on creating the MNIST example but I'm expecting it will take a few more days as I have other tasks in queue. Maybe I can include the example in my next PR on SVRG in distributed mode? @anirudhacharya SVRGOptimizer is designed to be used with SVRGModule alone. The actual SVRG optimization logic is implemented in the SVRGModule and the SVRGOptimizer is designed to assist the full gradients accumulation in the KVStore. I've made SVRGOptimizer a private class and added the reasonings behind the design in greater detail in the docs/api/python/contrib/svrg_optimization. Hopefully it clarifies things a bit. @aaronmarkham I've added pointer about the svrg_optimization in index.md and refined the docs in the example/svrg_module/README. |
sandeep-krishnamurthy
left a comment
There was a problem hiding this comment.
LGTM. Left 2 minor comments, please fix. Thank you for your contributions.
This PR is ready to go after approval from @aaronmarkham for documentation and @anirudhacharya for his code review comments.
| mod.init_params() | ||
| mod.init_optimizer(optimizer='sgd', optimizer_params={'learning_rate': 0.1}) | ||
| mod.update() | ||
| mod.save_checkpoint('test', 0, save_optimizer_states=True) |
There was a problem hiding this comment.
Recommended to use tempfile. This avoid race condition in file access from other tests and also clean up is easy. Something like:
import tempfile
import os
....
tmp = tempfile.mkdtemp()
tmpfile = os.path.join(tmp, 'svrg_test')
mod.save_checkpoint(tmpfile, .....)
tests/python/unittest/test_module.py
Outdated
| if __name__ == '__main__': | ||
| import nose | ||
| nose.runmodule() | ||
| nose.runmodule() No newline at end of file |
There was a problem hiding this comment.
nit: add a new line (pep)
|
@sandeep-krishnamurthy it now looks better after SVRGOptimizer was made private. Including SVRGOptimizer in |
anirudhacharya
left a comment
There was a problem hiding this comment.
This is how the documentation for this module renders right now - http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-12376/17/api/python/contrib/svrg_optimization.html
Private members like _SVRGOptimizer should not show up in the documentation
The description for the SVRGModule is not rendering. cc: @aaronmarkham
|
@anirudhacharya I'll fix it soon. Is there a way that I can see the rendered view before it gets pushed? |
|
@StephanieYuan some of the issues are with the content of the docs and few are with how the content renders. @aaronmarkham might be able to help with how the docs renders. You will probably have to build the docs and have to deploy it locally. |
c6842a0 to
d6e33f1
Compare
|
I just updated the how-to guide for building the docs. |
|
I'm not sure why those private members are showing up. Sphinx should recognize them as such if they're defined that way. Be sure to adjust settings.ini when testing the docs build, otherwise you'll be waiting forever each time while scala builds. |
|
@aaronmarkham Thanks for the pointer! |
dfddae1 to
3887fe9
Compare
e39bf63 to
49220e0
Compare
anirudhacharya
left a comment
There was a problem hiding this comment.
in the examples folder could you please add the code you used to generate the benchmarking graphs for the linear regression example. Since you are including the graphs in the examples folder, you should also have the code that can reproduce those results.
|
@anirudhacharya Sounds good! It's a Jupyter Notebook. I'll add it to the example folder. |
49220e0 to
4028bdf
Compare
…c. This version supports single machine SVRG with single cpu, gpu and multi-gpus.
4028bdf to
cd54f73
Compare
sandeep-krishnamurthy
left a comment
There was a problem hiding this comment.
Thanks for awesome work.
Having docs, example jupyter notebook, good test coverage is super useful.
…he#12376) Implemented a python SVRGModule for performing SVRG Optimization Logic. This version supports single machine SVRG with single cpu, gpu and multi-gpus. (apache#12376)
…he#12376) Implemented a python SVRGModule for performing SVRG Optimization Logic. This version supports single machine SVRG with single cpu, gpu and multi-gpus. (apache#12376)
Description
SVRG stands for Stochastic Variance Reduced Gradient, which is an optimization technique that complements SGD. It is introduced in the paper Accelerating Stochastic Gradient Descent using Predicative Variance Reduction in 2013.
Key Characteristics of SVRG:
SVRG Module should:
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments