Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeming unnecessary CGO depdenency in nvmlreceiver #180

Open
Dylan-M opened this issue Sep 12, 2023 · 5 comments
Open

Seeming unnecessary CGO depdenency in nvmlreceiver #180

Dylan-M opened this issue Sep 12, 2023 · 5 comments

Comments

@Dylan-M
Copy link

Dylan-M commented Sep 12, 2023

Currently, the nvmlreceiver is using a single dependency on a C library here:
https://github.com/GoogleCloudPlatform/opentelemetry-operations-collector/blob/master/receiver/nvmlreceiver/testcudakernel/test_cuda_kernel.cc

Based on my review of, but with limited understanding of the Nvidia libraries in general, this could be replaced with a native golang library.

From: https://docs.nvidia.com/cuda/cublas/index.html#id92

cublasStatus_t cublasXtDgemm(cublasXtHandle_t handle,
                           cublasOperation_t transa, cublasOperation_t transb,
                           int m, int n, int k,
                           const double          *alpha,
                           const double          *A, int lda,
                           const double          *B, int ldb,
                           const double          *beta,
                           double          *C, int ldc)

From: https://pkg.go.dev/gorgonia.org/cu/blas#Standard.Dgemm

func (impl *Standard) Dgemm(tA, tB blas.Transpose, m, n, k int, alpha float64, a []float64, lda int, b []float64, ldb int, beta float64, c []float64, ldc int)

These function definitions seem to be identical, with the exception of the cublasXtHandle. That handle wouldn't be needed in the pure golang implementation; so it makes sense for it to be missing.

If I am correct, this would be a nice change. Anything that can strip a collector down to a pure golang implementation is good in my book.

If I am incorrect in my belief that these libraries are functionally the same, I'm happy to be corrected on it.

@braydonk
Copy link
Contributor

Looping in @suffiank for implementation details here.

@Dylan-M
Copy link
Author

Dylan-M commented Sep 21, 2023

Any thoughts? This has come up in discussion again on a slack group.

@braydonk
Copy link
Contributor

braydonk commented Sep 21, 2023

Hi @Dylan-M,

Suffian is quite busy so I took a look. I'm not the primary point of contact for GPU-related stuff so this is best effort, but this is what I found.

I think getting rid of the C code would be good. The library does look like it will do what that little C++ load generator does. This would get rid of C++ code in this codebase which would be cleaner.

However, this is unfortunately not the only CGO dependency in this receiver. The NVML client we use is simply Go bindings for libnvidia-ml. See https://github.com/NVIDIA/go-nvml for more info.
What this means is that we probably can indeed get rid of that C++ code and make our tests a bit easier to work with, but it will not unburden this receiver from CGO entirely.

As such I don't think we're going to be able to prioritize this any time soon, but we welcome a PR to make that change that gets rid of C++ in this codebase. Thank you for raising the issue!

@Dylan-M
Copy link
Author

Dylan-M commented Sep 21, 2023

Thank you for having a look, a coworker also pointed out what you said about linking to that external library; which I had forgotten to look at external deps. Still a change that is worth making, however I fully agree with your assessment and prioritization.

@quentinmit
Copy link
Member

In Ops Agent we run a demo app from the CUDA SDK: https://github.com/GoogleCloudPlatform/ops-agent/blob/master/integration_test/third_party_apps_data/applications/nvml/exercise

But as @braydonk mentions, go-nvml requires cgo for everything, so we can't get GPU metrics without it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants