Dynamic Library Loading Support#15489
Conversation
cc55e0d to
94fda6e
Compare
671df6f to
ddd0e59
Compare
448e950 to
c490c23
Compare
|
@mxnet-label-bot add [backend, C API, pr-work-in-progress] |
src/common/library.cc
Outdated
| return nullptr; | ||
| } | ||
| #else | ||
| handle = dlopen(path, RTLD_LAZY); |
There was a problem hiding this comment.
Where are we calling the 'dlclose()'?
How are we checking that we are not calling "dlopen" multiple times for the same library? If we are allowing calling multiple dlopen for the same file, it should match with the same number of dlclose.
There was a problem hiding this comment.
Thanks @leleamol, per our offline discussion we will take the following actions:
- Create a map to track file handles and paths to the libraries when opened
- We'll only allow dlopen to be called once, but if the library is "reloaded" by the user, we'll skip dlopen and go through the same "library registration process" again.
- Add code to call dlclose for each library opened in the Engine descructors for each of the engine types in:
https://github.com/apache/incubator-mxnet/blob/eab6da6208fdc7b68437dadc6d7d58ee11c2f528/src/engine/naive_engine.cc#L70
https://github.com/apache/incubator-mxnet/blob/3a4698058840ff79e58709b6135c8fb757a99606/src/engine/threaded_engine.h#L327
| LocalFree(err_msg); | ||
| } | ||
| #else | ||
| *func = dlsym(handle, name); |
There was a problem hiding this comment.
The dlsym can return NULL and it may not necessarily indicate error condition. The recommended error handling is mentioned in this document.
https://linux.die.net/man/3/dlsym
There was a problem hiding this comment.
Thanks @leleamol, per this: https://stackoverflow.com/a/53590728 I think we can just take the shortcut and assume dlsym returning NULL means the symbol is not found. You have to try pretty hard to get gcc to put a function at the global 0 address.
|
@leleamol Can you please review again to see if the implementation of the dlclose matches what you were thinking? @anirudh2290 Thanks for the help with the engine destructor info, can you take a look at the changes to the engines: |
| LocalFree(err_msg); | ||
| } | ||
| #else | ||
| *func = dlsym(handle, name); |
| elif (os.name=='nt'): | ||
| lib = 'mylib.dll' | ||
|
|
||
| fname = mx.test_utils.download('https://mxnet-demo-models.s3.amazonaws.com/lib_binary/'+lib) |
There was a problem hiding this comment.
Unfortunately, this test introduces a security vulnerability. Please replace it as soon as possible as this poses a security threat to our system as well as our users since this allows arbitrary code execution. I'd appreciate it if you would provide a replacement within the next 24h. #15755 is the corresponding revert PR.
There was a problem hiding this comment.
I suggested before to build this stub during the build process instead of loading from S3. See previous comment above: https://github.com/apache/incubator-mxnet/pull/15489/files#r308932209
|
Thanks @marcoabreu for the catch. As we discussed offline, the problem is not with the feature in this PR but with how test_library_loading is pulling from the S3 bucket in the CI. We'll work to build the library as part of the build so that it can be used internally in testing the CI without security risks. We'll move the test pulling from the S3 bucket to the examples directory so that it doesnt run in the CI. We may need some help @marcoabreu to get this working in the CI, since we'll need to build this library and then find the path where it lives during the test so we can pass the path to the In the meantime, we'll make these changes and then resubmit this PR. |
Could you elaborate? |
|
@marcoabreu we added an example directory at: example/lib_api. Currently the test.py includes a test that uses the locally built library (built using the included Makefile). But this only provides an example for how to build a library. We'll create another example python script that pulls from the S3 bucket and loads that one. But it will just be an example, not a unittest that will run in the CI. Do you have other ideas/suggestions on how we can improve testing or provide more examples? |
|
No pulling of any prebuilt binaries due to the same reasons, please. I understand you trying to make it more convenient, but I think that we're stepping over a line in that case. Give instructions how to compile the example and the user should be fine. Just to confirm: we will have a unit test running in ci at the end of the day, correct? |
|
@marcoabreu I think we're in agreement here. We will be distributing some pre-built libraries in an S3 bucket as examples. We will provide documented examples for how to pull and load these binaries. But these will not be tests that will be automatically run. They will be examples. Yes, we will have the unit test running in the CI that uses a library that is built in the CI. |
|
I have a suggestion. We can check the |
|
While that eliviates the problem from a ci perspective (if the bucket would be owned by the CI owners), it's still a red flag for our users. I'm afraid that companies prefer to stay away from the execution of an arbitrary native library that is downloaded from the internet - no matter whether it is backed by a hash or not. It's inevitable in some cases, but we want to reduce the possiblity of compliance scanners flagging us and thus making the usage of the MXNet more complicated than it has to be. I think the only acceptable way is if the library is getting produced as part of the build process that is entirely retrieved from source. These are also the tenants Apache employs - Apache project must not make any binary releases in any form. Thus let's best stick to that as well :) |
Description
This PR is part of a larger effort of adding new functionality to MXNet that was compiled separately from MXNet. See here and here for more info.
This PR is the first step that enables libraries to be dynamically loaded into MXNet at runtime .
Features:
This PR includes the following:
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments