Add operator for dot(dns, csr) = csr#8938
Conversation
benchmark/python/sparse/dot.py
Outdated
|
|
||
| if rhs_density > 1 or rhs_density < 0: | ||
| raise ValueError("rhs_density has to be between 0 and 1") | ||
| raise ValueError("Value other than csr for lhs not supported") |
There was a problem hiding this comment.
The check and the error statement don't seem to match?
| * \brief CPU Kernel of PopulateCsrForNNC | ||
| * Parallelization by individual rows | ||
| */ | ||
| struct PopulateCsrForNNC { |
There was a problem hiding this comment.
Can you add brief description on what this kernel is for?
| dispatched = storage_type_assign(&out_stype, kDefaultStorage, | ||
| dispatch_mode, DispatchMode::kFComputeEx); | ||
| } | ||
| if (!dispatched && lhs_stype == kDefaultStorage && rhs_stype == kCSRStorage && |
There was a problem hiding this comment.
Is the implementation only available on CPU? No fallback on GPU ctx?
There was a problem hiding this comment.
I have added a check for CPU. will fallback to default storage for gpu
src/operator/tensor/dot-inl.h
Outdated
| const OpReqType req, NDArray* ret) { | ||
| if (kNullOp == req) return; | ||
| CHECK_EQ(rhs.storage_type(), kCSRStorage); | ||
| if (!rhs.storage_initialized()) return; |
There was a problem hiding this comment.
Shall we set the result to be ZerosCsrImpl before return?
| inline void DotDnsCsrCsrImpl(const OpContext& ctx, const cpu& cpu_dev, | ||
| const TBlob& lhs, const NDArray& rhs, | ||
| const OpReqType req, NDArray* ret) { | ||
| if (kNullOp == req) return; |
There was a problem hiding this comment.
Is kAddTo and kWriteInplace not checked?
There was a problem hiding this comment.
Thanks for pointing this. Fixed.
src/operator/tensor/dot-inl.h
Outdated
| return; | ||
| } | ||
|
|
||
| dim_t num_threads = mxnet_op::get_num_threads<cpu>(num_rows_l); |
src/operator/tensor/dot-inl.h
Outdated
| s, num_rows_l, nnc_idx, indptr_out, col_idx_out, nnc, num_rows_l); | ||
| mxnet_op::Kernel<mxnet_op::set_zero, cpu>::Launch(s, nnz, data_out); | ||
|
|
||
| if (nnc == 0) { |
There was a problem hiding this comment.
Shouldn't nnc never be 0 here?
There was a problem hiding this comment.
Why should nnc never be 0 ? This is possible when number of non zero columns are zero(matrix with all zeros) in the rhs. In this case we return the output correctly.
There was a problem hiding this comment.
Because you already checked rhs.storage_initialized() in line 922?
There was a problem hiding this comment.
I have removed the if and also added some documentation for storage_initialized
src/operator/tensor/dot-inl.h
Outdated
| s, num_rows_l, nnc_idx, indptr_out, col_idx_out, nnc, num_rows_l); | ||
| mxnet_op::Kernel<mxnet_op::set_zero, cpu>::Launch(s, nnz, data_out); | ||
|
|
||
| if (nnc == 0) { |
There was a problem hiding this comment.
Because you already checked rhs.storage_initialized() in line 922?
src/operator/tensor/dot-inl.h
Outdated
| // dns, csr -> csr | ||
| if (dev_mask == mshadow::cpu::kDevMask) { | ||
| dispatched = storage_type_assign(&out_stype, kCSRStorage, dispatch_mode, | ||
| DispatchMode::kFComputeEx); |
There was a problem hiding this comment.
Is output stype consistent on cpu and gpu? The output stype should be consistent to avoid confusion to users (see https://github.com/apache/incubator-mxnet/blob/d2a856a3a2abb4e72edc301b8b821f0b75f30722/src/operator/tensor/matrix_op-inl.h#L400-L418)
The only difference is that on GPU, it performs fallback. If the output stype infers sparse, then it first produce dense output, then cast it to sparse. The fallback is handled in executor already
include/mxnet/ndarray.h
Outdated
| void set_fresh_out_grad(bool state) const; | ||
| // returns true if a sparse ndarray's aux_data and storage are initialized | ||
| /*! \brief Returns true if a sparse ndarray's aux_data and storage are initialized | ||
| * Returns false if the indices array shape is inconsistent |
There was a problem hiding this comment.
"Returns false if the indices array shape is inconsistent" -> it actually throws an exception without returning false
| : DispatchMode::kFComputeEx; | ||
| dispatched = storage_type_assign(&out_stype, kCSRStorage, dispatch_mode, | ||
| dispatch_ex); | ||
| } |
There was a problem hiding this comment.
Hmm. we should log storage fallback as long as dispatch mode is dispatch_fallback:
https://github.com/apache/incubator-mxnet/blob/d2a856a3a2abb4e72edc301b8b821f0b75f30722/src/operator/elemwise_op_common.h#L79-L81
Maybe I should move this logic to the common path instead of letting developers specify that in operators
https://github.com/apache/incubator-mxnet/blob/master/src/executor/infer_graph_attr_pass.cc#L45-L54
There was a problem hiding this comment.
Yes moving the logic to common path would be nice. I see multiple places where we don't have this check. For example, https://github.com/apache/incubator-mxnet/blob/d2a856a3a2abb4e72edc301b8b821f0b75f30722/src/operator/tensor/elemwise_unary_op_basic.cc#L65 and https://github.com/apache/incubator-mxnet/blob/d2a856a3a2abb4e72edc301b8b821f0b75f30722/src/operator/tensor/elemwise_binary_scalar_op_basic.cc#L68 . These also need to be fixed right ?
There was a problem hiding this comment.
Yes. We can fix that in a separate PR.
| test_dot_csr(lhs_shape, (lhs_shape[0], 1), 'default', True, lhs_d, rhs_d) # (vector kernel) | ||
| test_dot_csr(lhs_shape, (lhs_shape[1], rnd.randint(5, 10)), 'default', False, lhs_d, rhs_d) # test gpu SpMM | ||
| test_dot_csr(lhs_shape, (lhs_shape[0], rnd.randint(5, 10)), 'default', True, lhs_d, rhs_d) # (scalar kernel) | ||
| test_dot_dns_csr(lhs_shape, (lhs_shape[1], rnd.randint(500, 1000)), lhs_d, lhs_d) |
There was a problem hiding this comment.
randint(50,200) is large (and slow) enough for testing. No need to increase the dim to 1000.
src/operator/tensor/dot-inl.h
Outdated
| using namespace mshadow::expr; | ||
| using nnvm::dim_t; | ||
|
|
||
| /*Initialize data structures*/ |
src/operator/tensor/dot-inl.h
Outdated
| const CType start_idx = i * nnc; | ||
| nnvm::dim_t cur = 0; | ||
| indptr_out[i] = start_idx; | ||
| if (i == static_cast<int>(num_rows_l - 1)) indptr_out[i + 1] = indptr_out[i] + nnc; |
There was a problem hiding this comment.
As we are adding large array support in the future, it's more appropriate to cast i up to dim_t instead of cast num_rows_l down to int.
|
@eric-haibin-lin Thank you for reviewing! I have made the necessary changes. |
|
Is the operator documentation not updated? |
|
Added dot(dns, csr) = csr to operator doc |
* Add operator for dot(dns, csr) = csr * Fix whitespace * Add comments * Add comments and fix error message * Fixes for dot dns csr * Fixes * Remove non required statements * Add fallback for GPU * Remove unused if * Fix comments and casting * Add operator to the documentation
* Add operator for dot(dns, csr) = csr * Fix whitespace * Add comments * Add comments and fix error message * Fixes for dot dns csr * Fixes * Remove non required statements * Add fallback for GPU * Remove unused if * Fix comments and casting * Add operator to the documentation
* Add operator for dot(dns, csr) = csr * Fix whitespace * Add comments * Add comments and fix error message * Fixes for dot dns csr * Fixes * Remove non required statements * Add fallback for GPU * Remove unused if * Fix comments and casting * Add operator to the documentation
Description
Adds operator for dot(dns, csr) = csr. Backward pass will fallback to default implementations.
The performance is better than dot(dns, dns) for sparsity less than 0.5% (c4.8xlarge). Below are the results for tests on c4.8xlarge with OMP_NUM_THREADS set to 32.
Checklist
Essentials
make lint)Changes