[C++] Improve inference script to support benchmark on Imagenet#15164
[C++] Improve inference script to support benchmark on Imagenet#15164pengzhao-intel merged 12 commits intoapache:masterfrom
Conversation
|
@wuxun-zhang ping me by this ID :) |
|
@wuxun-zhang Thanks! That's a great addition! @mxnet-label-bot Add [pr-awaiting-review, C++] |
|
@wuxun-zhang could you rebase the code? |
1b4784f to
19ced82
Compare
|
Now CI has passed, could you please take a look at this PR again? @pengzhao-intel @ZhennanQin @TaoLv @ciyongch Thanks |
| |[ResNet152-V2](#8)|[MXNet ModelZoo](http://data.mxnet.io/models/imagenet/resnet/152-layers/)|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec)|76.65%|76.36%| | ||
| |[Inception-BN](#9)|[MXNet ModelZoo](http://data.mxnet.io/models/imagenet/inception-bn/)|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec)|72.28%|72.20%| | ||
|
|
||
| The following performance numbers are collected by using Skylake 6148 with 20 physical cores. |
There was a problem hiding this comment.
refresh the data by C5.x18large?
There was a problem hiding this comment.
Thanks for your suggestion. Already Done.
|
@szha is it ok to change the file name from inception_inference.cpp to imagenet_inference.cpp to cover more models? Any backward compatible issue? |
|
@anirudh2290 would you mind reviewing this PR? |
| */ | ||
| void Predictor::InitParameters() { | ||
| std::vector<mx_uint> data_shape; | ||
| for (index_t i=0; i < input_shape.ndim(); i++) { |
| // initializer to call | ||
| Xavier xavier(Xavier::uniform, Xavier::avg, 2.0f); | ||
| index_t i = 0; | ||
| for (auto& name : net.ListArguments()) { |
| } | ||
|
|
||
| i = 0; | ||
| for (auto& name : net.ListAuxiliaryStates()) { |
|
|
||
| // initializer to call | ||
| Xavier xavier(Xavier::uniform, Xavier::avg, 2.0f); | ||
| index_t i = 0; |
There was a problem hiding this comment.
This is for indexing infered shape of each arguments (see Line334), which is needed for Xavier to initialize. So does i = 0 in Line339.
There was a problem hiding this comment.
Quite strange usage. If so, please declare i in for loop header.
| args_map[name] = tmp_arr.Copy(global_ctx); | ||
| } | ||
|
|
||
| i = 0; |
|
|
||
| // infer and create ndarrays according to the given input ndarrays. | ||
| net.InferExecutorArrays(global_ctx, &arg_arrays, &grad_arrays, &grad_reqs, &aux_arrays, args_map, | ||
| std::map<std::string, NDArray>(), std::map<std::string, OpReqType>(), |
|
|
||
| double get_msec() { | ||
| struct timeval time; | ||
| gettimeofday(&time, NULL); |
There was a problem hiding this comment.
This is a Linux function, if you want to support this example on Windows, you need to define windows code path with macro. Of course, it's not mandatory. Maybe you can add some comment for this.
There was a problem hiding this comment.
Maybe I can use std::chrono::system_clock::now() to cover Windows and Linux.
| std::map<std::string, NDArray> aux_map; | ||
| Symbol net; | ||
| Executor *executor; | ||
| Shape input_shape; |
There was a problem hiding this comment.
Add suffix _ to these private vars.
|
|
||
| // infer and create ndarrays according to the given input ndarrays. | ||
| net.InferExecutorArrays(global_ctx, &arg_arrays, &grad_arrays, &grad_reqs, &aux_arrays, args_map, | ||
| std::map<std::string, NDArray>(), std::map<std::string, OpReqType>(), |
| net.InferExecutorArrays(global_ctx, &arg_arrays, &grad_arrays, &grad_reqs, &aux_arrays, args_map, | ||
| std::map<std::string, NDArray>(), std::map<std::string, OpReqType>(), | ||
| aux_map); | ||
| for (auto& i : grad_reqs) i = OpReqType::kNullOp; |
There was a problem hiding this comment.
In InferExecutorArrays function, if a user didn't specify OpReqType for an argument, it will be assigned a default value, like kNullOp for data and label, kWriteTo for others. However, we don't need consider updating gradients in only forward pass, so we need reassign kNullOp for all arguments.
| return false; | ||
| } | ||
|
|
||
| std::vector<index_t> shape_vec {input_shape[1], input_shape[2], input_shape[3]}; |
There was a problem hiding this comment.
What if input_shape is not 4D? better not use hard code here to get the shape value from input_shape.
| std::vector<float> dummy_data(input_shape.Size()); | ||
| std::default_random_engine generator; | ||
| std::uniform_real_distribution<float> val(0.0f, 1.0f); | ||
| for (int i = 0; i < static_cast<int>(input_shape.Size()); ++i) { |
There was a problem hiding this comment.
For a large input array, int might not be enough to indexing the array.
| } | ||
| } | ||
| ms = get_msec() - ms; | ||
| num_inference_batches = (nBatch == num_inference_batches) ? num_inference_batches : nBatch; |
There was a problem hiding this comment.
same as num_inference_batches = nBatch ?
| num_inference_batches = (nBatch == num_inference_batches) ? num_inference_batches : nBatch; | ||
|
|
||
| auto args_name = net.ListArguments(); | ||
| std::cout << "INFO:" << "Dataset for inference: " << dataset_ << std::endl |
|
|
||
| dst_vec.push_back(elem); | ||
| while (*p_next) { | ||
| if (!bFloat) { |
| int index = 1; | ||
| while (index < argc) { | ||
| if (strcmp("--symbol_file", argv[index]) == 0) { | ||
| index++; |
| std::string input_shape("3 224 224"); | ||
| int seed = 48564309; | ||
| int shuffle_chunk_seed = 3982304; | ||
| int data_nthreads = 60; |
There was a problem hiding this comment.
If this is not the recommened num_threads by default, please change to a proper one.
There was a problem hiding this comment.
Actually, num_threads for data decoding will be determined by two parameters. One is data_nthreads and another is user-defined OMP_NUM_THREADSenv variable through calling omp_get_num_procs(), please refer to iter_image_recordio_2.cc#L146. So here I want to set a large number to make use of all cores. Also, data_nthreads = 60 is to align with the current Python implementation.
| # under the License. | ||
|
|
||
| # Downloading the data and model | ||
| mkdir -p model |
There was a problem hiding this comment.
Can we change this to only download the files when they're not existed in the specified directory?
There was a problem hiding this comment.
Of course. I used wget -nc below to avoid re-downloading dataset and models.
| # specific language governing permissions and limitations | ||
| # under the License. | ||
|
|
||
| # Downloading the data and model |
| ./inception_inference --symbol "./model/Inception-BN-symbol.json" --params "./model/Inception-BN-0126.params" --synset "./model/synset.txt" --mean "./model/mean_224.nd" --image "./model/dog.jpg" | ||
| ``` | ||
| Alternatively, The script [unit_test_inception_inference.sh](<https://github.com/apache/incubator-mxnet/blob/master/cpp-package/example/inference/unit_test_inception_inference.sh>) downloads the pre-trained **Inception** model and a test image. The users can invoke this script as follows: | ||
| For a quickly inference test, users can directly run [unit_test_imagenet_inference.sh](<https://github.com/apache/incubator-mxnet/blob/master/cpp-package/example/inference/unit_test_imagenet_inference.sh>) by using the below command. This script will automatically download the pre-trained **Inception-Bn** and **resnet50_v1_int8** model and **validation dataset** which are required for inference. |
There was a problem hiding this comment.
For a quickly inference test -> For a quick inference test
| } | ||
| virtual void InitQuantizedBias(NDArray* arr) { | ||
| (*arr) = (int8_t)0; | ||
| } |
There was a problem hiding this comment.
doesnt operator= only support float currently ?
There was a problem hiding this comment.
Thanks for your comments. Now operator= does not support int8. If I add support for int8, there will be ambiguity due to implicit type conversion. For example, when I try to use (*arr) = 0 to initialize a NDArray, the compiler doesn't know which operator= function should be called. So, as a work-around method, I will assign a float value to a int8 NDArray temporarily.
3f3efd1 to
7fde2f2
Compare
|
@anirudh2290 @ZhennanQin @ciyongch please review again if all concerns are resolved :) |
| |[ResNet152-V2](#8)|[MXNet ModelZoo](http://data.mxnet.io/models/imagenet/resnet/152-layers/)|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec)|76.65%|76.36%| | ||
| |[Inception-BN](#9)|[MXNet ModelZoo](http://data.mxnet.io/models/imagenet/inception-bn/)|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec)|72.28%|72.20%| | ||
|
|
||
| The following performance numbers are collected by using C5.18xlarge. |
There was a problem hiding this comment.
| The following performance numbers are collected by using C5.18xlarge. | |
| The following performance numbers are collected by using AWS EC2 C5.18xlarge. |
|
|
||
| This example demonstrates image classification workflow with pre-trained models using MXNet C++ API. Now this script also supports inference with quantized CNN models generated by Intel® MKL-DNN (see this [quantization flow](https://github.com/apache/incubator-mxnet/blob/master/example/quantization/README.md)). By using C++ API, the latency of most models can get **2%~17%** speedup compared with current Python implementation. The below tables show accuracy and latency of several CNN models. | ||
|
|
||
| The following models have been tested on Linux systems. And 50000 images are used to collect the following accuracy numbers. |
There was a problem hiding this comment.
I would expect that C++ inference should have the same accuracy as python inference. But seems the data here is different from https://github.com/apache/incubator-mxnet/tree/master/example/quantization. I guess these two tables might use different versions of MXNet or GluonCV. But it's still confusing.
There was a problem hiding this comment.
Actually, I have tested accuracy for C++ and Python locally. There are no difference. I think the numbers in page maybe too old and need to be updated.
|
|
||
| # Running inference on imagenet. | ||
| if [ "$(uname)" == "Darwin" ]; then | ||
| echo ">>> INFO: FP32 real data" |
| echo ">>> INFO: FP32 dummy data" | ||
| DYLD_LIBRARY_PATH=${DYLD_LIBRARY_PATH}:../../../lib ./imagenet_inference --symbol_file "./model/Inception-BN-symbol.json" --batch_size 1 --num_inference_batches 500 --benchmark | ||
| else | ||
| echo ">>> INFO: FP32 real data" |
| lib_name=$(ls -a ../../../lib | grep -oE 'mkldnn' | tail -1) | ||
| if [[ -n ${lib_name} ]] && [[ 'mkldnn' =~ ${lib_name} ]]; then | ||
| echo ">>> INFO: INT8 dummy data" | ||
| LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:../../../lib ./imagenet_inference --symbol_file "./model/resnet50_v1_int8-symbol.json" --batch_size 1 --num_inference_batches 500 --benchmark |
There was a problem hiding this comment.
Do we have int8 real data inference?
There was a problem hiding this comment.
Now I cannot find someplace to download INT8 params file which is required for real data inference. So, I just added a dummy data inference here for INT8 resnet50 model.
|
@TaoLv Have updated the accuracy numbers in https://github.com/apache/incubator-mxnet/tree/master/example/quantization. There are no difference in Top-1 accuracy for C++ and Python. Please help review again. Thanks. |
|
@wuxun-zhang Thank you for addressing the comments. The problem is whether we want to maintain two int8 accuracy tables in different places. @pengzhao-intel what do you think? |
Agree, so add a link to point the accuracy into quantization README and performance data to perf.md. |
| ## [imagenet_inference.cpp](<https://github.com/apache/incubator-mxnet/blob/master/cpp-package/example/inference/imagenet_inference.cpp>) | ||
|
|
||
| This example demonstrates image classification workflow with pre-trained models using MXNet C++ API. Now this script also supports inference with quantized CNN models generated by Intel® MKL-DNN (see this [quantization flow](https://github.com/apache/incubator-mxnet/blob/master/example/quantization/README.md)). By using C++ API, the latency of most models can get **2%~17%** speedup compared with current Python implementation. The below tables show accuracy and latency of several CNN models. | ||
|
|
There was a problem hiding this comment.
Don't mention the exact number (2%~15%) since this number may be changed with the evolution of python and C++ interface.
|
After offline talking, I think we can leave the performance of CLX (c5.12xlarge/24xlarge) on this page :) |
b01653b to
b265dca
Compare
|
Seems all checks are passed but hang now. |
|
Thanks for the nice PR and merge now. |
Description
This PR is used to improve C++ inference script. Now users can run benchmark with FP32/INT8 models on Imagenet. Simultaneously, there are a bit latency improvement of C++ over Python.
The below table shows latency improvement between C++ and Python for several CNN model.
(Update numbers based on C5.18xlarge)
@pengzhao-intel @ZhennanQin @TaoLv @ciyongch Please help to review. Thanks.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments