Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add caliper annotations to quest_candidates_example #1419

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

bmhan12
Copy link
Contributor

@bmhan12 bmhan12 commented Sep 19, 2024

This PR:

  • Adds caliper annotations to quest_candidates_example

As part of this, also re-ran my test scripts using the same setup as before to get the average numbers (in seconds) for the spatial index performances. In addition, I added numbers for rzwhippet for 112 threads.

Notably, the initialization times for both bvh and implicit grid are an order of magnitude faster than before for HIP and CUDA (previous PR #1278 for comparison):

Policy - Spatial Index Initialize Spatial Index Query candidates Write candidate pairs Total average processing runtime System Tested On
Sequential - BVH (36 threads) 2.527 8.701 0.377 11.605 rzgenie
Sequential - BVH (112 threads) 1.498 5.377 0.208 7.083 rzwhippet
OpenMP - BVH (36 threads) 0.580 0.517 0.340 1.437 rzgenie
OpenMP - BVH (112 threads) 0.483 0.262 0.248 0.993 rzwhippet
CUDA - BVH 0.071 0.056 0.013 0.139 rzansel
HIP - BVH 0.158 0.098 0.319 0.575 rzvernal
Sequential - Implicit Grid (36 threads) 1.266 143.984 138.798 284.048 rzgenie
Sequential - Implicit Grid (112 threads) 0.748 85.579 84.509 170.836 rzwhippet
OpenMP - Implicit Grid (36 threads) 0.609 5.319 5.468 11.396 rzgenie
OpenMP - Implicit Grid (112 threads) 0.329 2.257 2.436 5.022 rzwhippet
CUDA - Implicit Grid 0.129 0.915 1.355 2.399 rzansel
HIP - Implicit Grid 0.146 3.713 3.985 7.844 rzvernal

Same testing setup as last time, but with caliper:

  • Test command: time ./examples/quest_candidates_example_ex -i ucart23z.cycle_000000.root -q ucart23z_shifted.cycle_000000.root -p <raja policy number> -m <method, either "bvh" or "implicit"> --caliper report
  • HIP command: flux run -N 1 -g 1
  • CUDA command: lrun -n 1 -g 1
  • OpenMP allocation: salloc -N 1 -n 36 for rzgenie, salloc -N 1 -n 112 for rzwhippet
  • ucart23z is an 8,000,000 element mesh, while ucart23z_shifted is the same mesh but shifted slightly.

@bmhan12 bmhan12 added the Quest Issues related to Axom's 'quest' component label Sep 19, 2024
@bmhan12
Copy link
Contributor Author

bmhan12 commented Sep 19, 2024

Here's an example of the CUDA-BVH output with caliper report:

$ lrun -n 1 -g 1 ./examples/quest_candidates_example_ex -i ../../ucart23z.cycle_000000.root  -q ../../ucart23z_shifted.cycle_000000.root -p 2 --caliper report
[INFO] 
     Parsed parameters:
      * First Blueprint mesh to insert: '../../ucart23z.cycle_000000.root'
      * Second Blueprint mesh to query: '../../ucart23z_shifted.cycle_000000.root'
      * Verbose logging: false
      * Spatial method: 'Bounding Volume Hierarchy (BVH)'
      * Resolution: 'Not Applicable'
      * Runtime execution policy: 'cuda'
       
[INFO] Reading Blueprint file to insert: '../../ucart23z.cycle_000000.root'...
 
[INFO] Mesh bounding box is { min:(-1,-1,-1); max:(1,1,1); range:<2,2,2> }.
 
[INFO] Reading Blueprint file to query: '../../ucart23z_shifted.cycle_000000.root'...
 
[INFO] Mesh bounding box is { min:(-0.995,-0.995,-0.995); max:(1.005,1.005,1.005); range:<2,2,2> }.
 
[INFO] Reading in Blueprint files took 5.07 seconds. 
[INFO] Running BVH candidates algorithm in execution Space: [CUDA_EXEC] 
[INFO] 0: Initializing BVH took 0.0708 seconds. 
[INFO] 1: Querying candidate bounding boxes took 0.0557 seconds. 
[INFO] 2: Initializing candidate pairs (on device) took 0.0128 seconds. 
[INFO] 3: Moving candidate pairs to host took 10.5 seconds. 
[INFO] Stats for query
    -- Number of insert-BVH mesh hexes 8,000,000
    -- Number of query mesh hexes 8,000,000
    -- Total possible candidates 64,000,000,000,000
    -- Candidates from BVH query 63,521,199
     
[INFO] Computing candidates took 10.6 seconds. 
[INFO] Mesh had 63,521,199 candidates pairs 
Path                                  Min time/rank Max time/rank Avg time/rank Time %    
quest candidates example                  15.713506     15.713506     15.713506 99.998934 
  load Blueprint meshes                    5.070086      5.070086      5.070086 32.265441 
    load Blueprint hexahedron mesh         4.999803      4.999803      4.999803 31.818169 
  find candidates                         10.631097     10.631097     10.631097 67.655072 
    initializing BVH                       0.070883      0.070883      0.070883  0.451092 
      BVH::initialize                      0.070824      0.070824      0.070824  0.450713 
        LinearBVH::buildImpl               0.070816      0.070816      0.070816  0.450663 
          build_radix_tree                 0.047904      0.047904      0.047904  0.304856 
            RadixTree::allocate            0.018013      0.018013      0.018013  0.114634 
            transform_boxes                0.001531      0.001531      0.001531  0.009742 
            reduce_abbs                    0.006462      0.006462      0.006462  0.041126 
            get_mcodes                     0.000526      0.000526      0.000526  0.003347 
            sort_mcodes                    0.002699      0.002699      0.002699  0.017174 
              array_counting               0.000062      0.000062      0.000062  0.000397 
              raja_stable_sort             0.002626      0.002626      0.002626  0.016711 
            reorder                        0.009216      0.009216      0.009216  0.058650 
            build_tree                     0.000506      0.000506      0.000506  0.003220 
            propagate_abbs                 0.008909      0.008909      0.008909  0.056697 
          LinearBVH::allocate              0.014877      0.014877      0.014877  0.094675 
          emit_bvh_parents                 0.004852      0.004852      0.004852  0.030880 
    query candidates                       0.055728      0.055728      0.055728  0.354644 
      BVH::findBoundingBoxes               0.053715      0.053715      0.053715  0.341837 
        LinearBVH::findCandidatesImpl      0.053567      0.053567      0.053567  0.340893 
          PASS[1]:count_traversal          0.021550      0.021550      0.021550  0.137143 
          exclusive_scan                   0.000110      0.000110      0.000110  0.000698 
          allocate_candidates              0.004724      0.004724      0.004724  0.030064 
          PASS[2]:fill_traversal           0.027167      0.027167      0.027167  0.172889 
    write candidate pairs                  0.012821      0.012821      0.012821  0.081593 
    copy pairs to host                    10.483453     10.483453     10.483453 66.715484 

and an example of the CUDA-Implicit Grid output with caliper report:

$ lrun -n 1 -g 1 ./examples/quest_candidates_example_ex -i ../../ucart23z.cycle_000000.root  -q ../../ucart23z_shifted.cycle_000000.root -p 2 -m implicit --caliper report
[INFO] 
     Parsed parameters:
      * First Blueprint mesh to insert: '../../ucart23z.cycle_000000.root'
      * Second Blueprint mesh to query: '../../ucart23z_shifted.cycle_000000.root'
      * Verbose logging: false
      * Spatial method: 'Implicit Grid'
      * Resolution: '0'
      * Runtime execution policy: 'cuda'
       
[INFO] Reading Blueprint file to insert: '../../ucart23z.cycle_000000.root'...
 
[INFO] Mesh bounding box is { min:(-1,-1,-1); max:(1,1,1); range:<2,2,2> }.
 
[INFO] Reading Blueprint file to query: '../../ucart23z_shifted.cycle_000000.root'...
 
[INFO] Mesh bounding box is { min:(-0.995,-0.995,-0.995); max:(1.005,1.005,1.005); range:<2,2,2> }.
 
[INFO] Reading in Blueprint files took 5.04 seconds. 
[INFO] Running Implicit Grid candidates algorithm in execution Space: [CUDA_EXEC] 
[INFO] 0: Initializing Implicit Grid took 0.13 seconds. 
[INFO] 1: Querying candidate bounding boxes took 0.934 seconds. 
[INFO] 2: Initializing candidate pairs (on device) took 1.36 seconds. 
[INFO] 3: Moving candidate pairs to host took 10.9 seconds. 
[INFO] Stats for query
    -- Number of insert mesh hexes 8,000,000
    -- Number of query mesh hexes 8,000,000
    -- Total possible candidates 64,000,000,000,000
    -- Candidates from Implicit Grid query 63,521,199
     
[INFO] Computing candidates took 13.4 seconds. 
[INFO] Mesh had 63,521,199 candidates pairs 
Path                               Min time/rank Max time/rank Avg time/rank Time %    
quest candidates example               18.438882     18.438882     18.438882 99.999133 
  load Blueprint meshes                 5.042328      5.042328      5.042328 27.345932 
    load Blueprint hexahedron mesh      4.973948      4.973948      4.973948 26.975089 
  find candidates                      13.384182     13.384182     13.384182 72.586102 
    initializing implicit grid          0.129856      0.129856      0.129856  0.704246 
    query candidates                    0.933588      0.933588      0.933588  5.063107 
    write candidate pairs               1.356354      1.356354      1.356354  7.355883 
    copy pairs to host                 10.916162     10.916162     10.916162 59.201351 

Copy link
Member

@kennyweiss kennyweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bmhan12 -- these new numbers are a really nice improvement over the previous ones.

You didn't call it out explicitly, but the improved initialization timings are likely related to the improvements you've been making in converting from UNIFIED to DEVICE memory (tracked here: #1339 )

@@ -434,6 +446,7 @@ template <typename ExecSpace>
axom::Array<IndexPair> findCandidatesBVH(const HexMesh& insertMesh,
const HexMesh& queryMesh)
{
AXOM_ANNOTATE_BEGIN("initializing BVH");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Would it make sense to remove the explicit timers now that we have caliper?

Having both will cause the outer wrapper to include timings for the inner one, and in this case, the caliper timings will include the SLIC formatting and logging times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Quest Issues related to Axom's 'quest' component
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants