Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda: add moments #3516

Merged
merged 1 commit into from
Dec 5, 2023
Merged

cuda: add moments #3516

merged 1 commit into from
Dec 5, 2023

Conversation

cudawarped
Copy link
Contributor

@cudawarped cudawarped commented Jun 29, 2023

As requested by @cv3d this PR continues the work of adding cuda::moments started in #3500.

This PR introduces two functions for calculating image moments on the device, one asynchronous (cuda::spatialMoments) which only calculates the spatial moments and leaves the result on the device and one synchronous (cuda::moments) which calculates the spatial moments on the device, downloads the result to the host and then uses cv::Moments() to calculate the remaining centralized and normalized image moments.

The existing cv::moments() function returns the result in double precision due to the magnitude of the calculated image moments. Unfortunately double precision performance on Nvidia GPU's is at least 32x (depending on the generation) slower than single precision making the perfomance of this asynchrounous device side version only marginally (~3x depending on CPU/GPU) faster than its CPU counterpart. As a result the user can increase the performance by choosing a single precision result, and/or reducing the number of spatial image moments to calculate.

To give an indication of the performace resulting from the various parameter combinations, the table below has some preliminary timings of the CPU (i72700H) version vs the kernel execution time on a RTX 3070 Ti Laptop GPU. The source image was uchar and contained a circle in the centre of radius 0.9*width.

  CPU (us)     GPU (us)      
Size binaryImage == false binaryImage == true GPU moments type 1st order 2nd order 3rd order speed up
640x480 117 472 float 4.61 3.71 3.81 31
640x480 117 472 double 13.63 26.05 39.62 3
1280x720 327 1288 float 5.95 5.09 5.09 64
1280x720 327 1288 double 21.25 45.73 70.79 5
1920x1080 572 4413 float 8.64 7.49 8.38 68
1920x1080 572 4413 double 55.68 128.55 204.77 3
4096x2160 2699 14674 float 22.66 22.62 24.26 111
4096x2160 2699 14674 double 165.89 388.74 628.62 4

Note:

  1. The results on the GPU are with binaryImage == false as the performace is not affected by this flag, unlike on the CPU where the performance with binaryImage == true is significantly slower.
  2. The GPU times measure the kernel execution only, not the launch latency.
  3. The speed up column compares the faster CPU version (binaryImage == false) with the kernel execution time when calculating all the spatial image moments.

@cv3d I hope this is what you wanted? Can you please take a look and let me know if this works correctly on the data you are using? Additionally if you have any comments suggestions please let me know.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@cudawarped cudawarped force-pushed the cuda_moments branch 2 times, most recently from 55a2fb5 to ba4551c Compare June 29, 2023 10:37
@cv3d cv3d mentioned this pull request Jul 6, 2023
5 tasks
@cv3d
Copy link
Contributor

cv3d commented Jul 7, 2023

@cudawarped Thanks a lot for your efforts. I have performed preliminary tests, and it seems satisfying my use case. I might get back to you later on with more details or comments once I get better chance to dig on this further.

@alalek @asmorkalov Can you please consider this PR?

@asmorkalov asmorkalov self-requested a review July 7, 2023 06:03
@asmorkalov asmorkalov self-assigned this Jul 7, 2023
@cv3d
Copy link
Contributor

cv3d commented Jul 7, 2023

@cudawarped Perhaps an idea for a follow PR, but I wonder if we can have something powerful around binary images. For example, with a multi-labels image, maybe we can compute moments of all shapes?

std::vector<cv::Moments> momentsOfLabels(InputArray src, ...);

Actually, we already have cv::cuda::connectedComponents (thanks @stal12), so having cv::cuda::connectedComponentsWithStats would be great, perhaps with expanded ConnectedComponentsTypes to contain more moments.

If you think this deserves pushing for, then maybe I can open an issue, and tag some people.

@cudawarped
Copy link
Contributor Author

cudawarped commented Jul 7, 2023

Actually, we already have cv::cuda::connectedComponents (thanks @stal12), so having cv::cuda::connectedComponentsWithStats would be great, perhaps with expanded ConnectedComponentsTypes to contain more moments.

@cv3d I agree but it usually envolves several steps making it quite slow, i.e.

  1. Getting a list (histogram if you want pixel counts or otherwise) of all the marker labels. CCL usually labels each blob with the lowest image index of a connected region.
  2. Compressing the list.
  3. Relabeling the image using compressed labels. We could stop here for your moments suggestion.
  4. Processing the relabeled image calculating the bounds of each blob.

NPP has serveral routines to do this although I have found them to be buggy and very slow plus you need to use their slow labeling routine first before you can use CompressedMarkerLabelsInfo.

It would be interesting but unfortunately I don't have time in the near future to look at this.

@cv3d
Copy link
Contributor

cv3d commented Jul 25, 2023

@alalek @asmorkalov This PR is ready for a while now.
Thanks for your consideration~

@cudawarped
Copy link
Contributor Author

@asmorkalov Thank you for running the CI, sorry it looks like I've left the wrong sanity check in for the spatial moments test I'll address this when I can.

@cudawarped
Copy link
Contributor Author

cudawarped commented Jul 28, 2023

@asmorkalov Fixed the sanity check, ready for your review 🤞

@cudawarped
Copy link
Contributor Author

@asmorkalov Should this be included in 4.9.0?

@asmorkalov
Copy link
Contributor

I tested the PR locally with CUDA 11.8 and GF 1080. Python test fails with error:

======================================================================
ERROR: test_moments (test_cudaimgproc.cudaimgproc_test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/mnt/projects/Projects/OpenCV/opencv_contrib/modules/cudaimgproc/misc/python/test/test_cudaimgproc.py", line 96, in test_moments
    moments_order = cv.cuda.MomentsOrder_THIRD
AttributeError: module 'cv2.cuda' has no attribute 'MomentsOrder_THIRD'

----------------------------------------------------------------------

@cudawarped
Copy link
Contributor Author

AttributeError: module 'cv2.cuda' has no attribute 'MomentsOrder_THIRD'

@asmorkalov Should be fixed now, I changed from enum class to enum so I could use CV_ENUM for clarity in the tests but forgot to retest in Python, sorry.

Copy link
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@asmorkalov asmorkalov merged commit 0bcbc73 into opencv:4.x Dec 5, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants