Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate resource limits for each priority class #2601

Merged
merged 5 commits into from
Jun 23, 2023

Conversation

severinson
Copy link
Contributor

@severinson severinson commented Jun 22, 2023

Currently, per-queue resource limits are based on resources allocated to jobs of a particular priority. This causes problems when several priority classes have the same priority, since all jobs with priority classes of equal priority count towards the same limit. In this PR, we change the limits to be based on resources allocated to jobs of a particular priority class. This allows us to apply separate limits to different priority classes even when they have the same priority.

If at some point we want to better support using several priority classes with different priorities, I suggest we introduce explicit cumulative limits in a separate PR, similar to those we've had previously but not defined as part of the priority classes (since each priority class may not have a unique priority).

@@ -282,10 +283,31 @@ func (q *AggregatedQueueServer) getJobs(ctx context.Context, req *api.StreamingL
})
}

// Map queue names to priority factor for all active queues, i.e.,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just moved to earlier in the same function.

// Nodes to be considered by the scheduler.
lastSeen := q.clock.Now()
nodes := make([]*schedulerobjects.Node, 0, len(req.Nodes))
allocatedByQueueForCluster := make(map[string]schedulerobjects.QuantityByPriorityAndResourceType)
allocatedByQueueAndPriorityClassForCluster := make(map[string]schedulerobjects.QuantityByTAndResourceType[string], len(queues))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now track resources by priority class name instead of priority class priority.

@@ -39,83 +39,85 @@ func V1ResourceListFromResourceList(rl ResourceList) v1.ResourceList {
return rv
}

type QuantityByPriorityAndResourceType map[int32]ResourceList
type QuantityByTAndResourceType[T comparable] map[T]ResourceList
Copy link
Contributor Author

@severinson severinson Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replaced QuantityByPriorityAndResourceType with the more generic QuantityByTAndResourceType[int32], which also allows us to use QuantityByTAndResourceType[string], to track resources by priority class name. Based on some micro benchmarks I created, the performance of QuantityByPriorityAndResourceType and QuantityByTAndResourceType[int32] is identical.

@severinson severinson changed the title Fix per-pc resource limits Separate resource limits for each priority class Jun 22, 2023
@codecov
Copy link

codecov bot commented Jun 22, 2023

Codecov Report

Patch coverage: 39.53% and project coverage change: +0.02 🎉

Comparison is base (476bc55) 58.56% compared to head (dbcdb0b) 58.59%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2601      +/-   ##
==========================================
+ Coverage   58.56%   58.59%   +0.02%     
==========================================
  Files         236      236              
  Lines       30380    30365      -15     
==========================================
  Hits        17791    17791              
+ Misses      11221    11207      -14     
+ Partials     1368     1367       -1     
Flag Coverage Δ
armada-server 58.59% <39.53%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
internal/armada/server/lease.go 7.48% <0.00%> (-0.23%) ⬇️
...ternal/executor/utilisation/cluster_utilisation.go 39.08% <0.00%> (ø)
internal/scheduler/common.go 75.77% <ø> (+9.47%) ⬆️
internal/scheduler/constraints/constraints.go 4.41% <0.00%> (+0.56%) ⬆️
internal/scheduler/preempting_queue_scheduler.go 65.14% <0.00%> (ø)
...nternal/scheduler/schedulerobjects/resourcelist.go 69.31% <68.42%> (ø)
internal/scheduler/context/context.go 34.26% <78.72%> (+0.67%) ⬆️
internal/scheduler/scheduling_algo.go 72.47% <94.73%> (+0.29%) ⬆️
internal/scheduler/reports.go 66.58% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

if err != nil {
return nil, err
}
priorityFactorByActiveQueue := make(map[string]float64, len(activeQueues))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually used anywhere?

@severinson severinson merged commit 807569f into master Jun 23, 2023
25 checks passed
@severinson severinson deleted the severinson/per-pc-limits branch June 23, 2023 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants