Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-4753: Expose ownerReferences via downward API #4754

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ArangoGutierrez
Copy link

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 9, 2024
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 9, 2024
@ArangoGutierrez
Copy link
Author

/cc @thockin

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ArangoGutierrez
Once this PR has been reviewed and has the lgtm label, please assign mrunalp, wojtek-t for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ArangoGutierrez ArangoGutierrez force-pushed the KEP-4753 branch 3 times, most recently from 6ae4faf to b57c505 Compare July 9, 2024 17:11

## Summary

Today when a pod wants to pass its 'onwerReferences' to a new object it manage (a ConfigMap, a Secret, etc), it needs to do it call the API server to get it's own ownerReferences and then pass it to the new object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Today when a pod wants to pass its 'onwerReferences' to a new object it manage (a ConfigMap, a Secret, etc), it needs to do it call the API server to get it's own ownerReferences and then pass it to the new object.
Today when a pod wants to pass its 'ownerReferences' to a new object it manage (a ConfigMap, a Secret, etc), it needs to do it call the API server to get it's own ownerReferences and then pass it to the new object.


- <test>: <link to test coverage>

### Graduation Criteria
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is not a API change, you may be able to skip alpha stage. WDYT @thockin?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an API change - a value that used to be rejected is now allowed, so in the event of a rollback, we need to know it is still handled. Ergo "alpha" is required.


## Motivation

Deployments and ReplicaSets can orphan and adopt pods dinamically. Any additional information we provide can change over time. On day 1 a pod named "foobar-deadbeef93-xyz76" is owned by replicaset "foobar-deadbeef93" which is owned by deployment "foobar", but after a node restart, the pod name could change to "foobar-deadbeef93-abc12", leaving the object created by "foobar-deadbeef93-xyz76" orphan, and triggering unwanted behavior. Enabling the pod to pass its ownerReferences to the new object it manages will prevent objects from being orphaned by pods that are owned by a higher level object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid this motivation doesn't make much sense to me. I think you've got some context in your head that is not in the text.

Why does a pod need to know it's own ownerRefs ? You seem to be implying that the pod will create some other object (e.g. a ConfigMap) and rather than parent that object itself, it wants to create the ConfigMap and set its ownerRef to the pod's ownerRef? Why?

It seems plausible to me that a pod would be able to know its own ownerRefs, but that use-case eludes me.

Help me understand what a pod will actually DO with this information?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about now?


## Motivation

DaemonSets and ReplicaSets (via Deployments) can dynamically orphan and adopt pods. Consider a scenario where, on day 1, a pod named foobar-deadbeef93-xyz76 is owned by the ReplicaSet foobar-deadbeef93, which in turn is managed by the Deployment foobar. After a node restart, the pod might be replaced by foobar-deadbeef93-abc12, orphaning the original pod foobar-deadbeef93-xyz76 and potentially causing unwanted behavior. This issue arises when pods create resources like ConfigMaps or CustomResources and these resources are not correctly reassigned to the new pod, leading to orphaned objects. Ensuring that pods can pass their ownerReferences to new objects they manage would prevent this issue, as the ownership hierarchy would be preserved even when pods are recreated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a node restart, the pod might be replaced by foobar-deadbeef93-abc12, orphaning the original pod foobar-deadbeef93-xyz76 and potentially causing unwanted behavior

can this really happen? it this not a bug on the replicaset?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is confusing because DaemonSet and Deployment are very different - a pod's parent is never a Deployment. I would retool this to JUST talk about DaemonSet

If I am a daemonset pod "foobar-x" on node "x", and I create a ConfigMap that is associated with node "x", then it would be incorrect to set the ownerRef to "foobar-x" - I should set it to the daemonset "foobar". If you set it to the pod, then you can race with garbage-collection.

For a ReplicaSet it's bad because Deployments can replace ReplicaSets, so the proper parent should be the Deployment, which is 2 levels up and is NOT covered by this KEP, right?

uid: 6be1683f-da9c-4f68-9440-82376231cfa6
```

If `ownerReferences` is empty or not set, the file will be empty.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happen if ownerReferences is updated? is it reflected in the Pod or the Pod keeps using the original value?


e2e testing will consist of the following:
- Create a Deployment and or DaemonSet
- Verify if the pod created by the Deployment/DaemonSet has the ownerReferences of the Deployment/DaemonSet via ENV VAR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this correct? you said before env variables are out of scope

Is out of the scopt of this KEP to add the ownerReferences to the pod's environment variables.

Comment on lines +375 to +376
- Feature gate name: `DownwardAPIOwnerReferences`
- Components depending on the feature gate: `kubelet`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why cannot this be a new field in kubelet configuration?
If the feature (implementation) turns out to be not appropriate, users can simply update the kubelet configuration and restart kubelet. However, if it is a feature gate, users don't know which components are affected. We don't have documentations elsewhere stating that this gate only affects kubelet. The best bet for them would be to restart all kube-xxx components because, this gate (as with others) will show up in kube-apiserver --help anyway when landed.

We have witnessed a lot of "abuse" of feature gates. Can we think twice about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feature gate is a mechanism to introduce new changes to mitigate the risks of breaking existing clusters, they are not meant to be used as flags, once they graduate they disappear and the functionality can not be disabled ... are you implying this KEP should add the functionality as opt-in or opt-out via configuration?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I know what feature gate was designed for ... My question is that do we want all new changes to be introduced with a feature gate? All changes could break existing clusters. I raised this concern because we are seeing that quite some new feature gates introduced lately are about trivial changes. e.g. new optimizations to the watch cache, adding new fields to the status of a resource, adding new entry to resource annotation ...

Maybe this one is not so trivial, maybe it actually warrants a new feature gate that takes several releases to become GA. My point is about the user-facing changes. For most cluster admins, feature gates are currently applied system wide, across all components. That is a HEAVY burden. I hope our team will consider this factor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want all new changes to be introduced with a feature gate?

If you think your change could break a cluster, then YES.

If you your change has any aspect of API, then YES.

new optimizations to the watch cache

Could absolutely break a cluster!

adding new fields to the status of a resource

Any API field we add goes through a gate so that we can safely do a rollback without breaking or dropping information.

adding new entry to resource annotation

If this produces a value which version x-1 of Kubernetes would reject, then it needs a gate. Always think about this case:

User upgrades cluster from x->y
User uses your new API field.
Uh oh! Something bad happened.
User rolls cluster back to x.

Your API field is still stored in etcd, and may have some lingering external effect - what will happen? What if the user tries to update the object - will it fail validation? Or will the new field just disappear?

Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
## Summary

Today when a pod wants to pass its `ownerReferences` to a new object it manage (a ConfigMap, a Secret, etc), it needs to do it call the API server to get it's own ownerReferences and then pass it to the new object.
The pod then needs GET access on all the intermediate resources which it may not have, and giving it that access is a security risk as it can access other resources it should not have access to.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pod only needs GET access on pods, not anything else?


## Motivation

DaemonSets and ReplicaSets (via Deployments) can dynamically orphan and adopt pods. Consider a scenario where, on day 1, a pod named foobar-deadbeef93-xyz76 is owned by the ReplicaSet foobar-deadbeef93, which in turn is managed by the Deployment foobar. After a node restart, the pod might be replaced by foobar-deadbeef93-abc12, orphaning the original pod foobar-deadbeef93-xyz76 and potentially causing unwanted behavior. This issue arises when pods create resources like ConfigMaps or CustomResources and these resources are not correctly reassigned to the new pod, leading to orphaned objects. Ensuring that pods can pass their ownerReferences to new objects they manage would prevent this issue, as the ownership hierarchy would be preserved even when pods are recreated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is confusing because DaemonSet and Deployment are very different - a pod's parent is never a Deployment. I would retool this to JUST talk about DaemonSet

If I am a daemonset pod "foobar-x" on node "x", and I create a ConfigMap that is associated with node "x", then it would be incorrect to set the ownerRef to "foobar-x" - I should set it to the daemonset "foobar". If you set it to the pod, then you can race with garbage-collection.

For a ReplicaSet it's bad because Deployments can replace ReplicaSets, so the proper parent should be the Deployment, which is 2 levels up and is NOT covered by this KEP, right?


DaemonSets and ReplicaSets (via Deployments) can dynamically orphan and adopt pods. Consider a scenario where, on day 1, a pod named foobar-deadbeef93-xyz76 is owned by the ReplicaSet foobar-deadbeef93, which in turn is managed by the Deployment foobar. After a node restart, the pod might be replaced by foobar-deadbeef93-abc12, orphaning the original pod foobar-deadbeef93-xyz76 and potentially causing unwanted behavior. This issue arises when pods create resources like ConfigMaps or CustomResources and these resources are not correctly reassigned to the new pod, leading to orphaned objects. Ensuring that pods can pass their ownerReferences to new objects they manage would prevent this issue, as the ownership hierarchy would be preserved even when pods are recreated.

For example, a CustomResource created by a pod managed by a DaemonSet may be unexpectedly garbage collected if the pod is deleted. This can disrupt system behavior, as the necessary resource is lost. By allowing the pod to inherit and pass down ownerReferences, the CustomResource would remain correctly managed by the DaemonSet, avoiding such disruptions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a cause-effect implied here that I don't understand. I think you mean to say that now you have enough information to set the owner of the CR to the DS itself?

### Risks and Mitigations

- Security Risk: Some environments may not want to expose the ownerReferences of a pod to the pod itself.
- Mitigation: This feature could be added as a feature gate, disabled by default.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a mitigation to this KEP. If there is some cluster wherein the admin DOES NOT want pods to be able to know their owners, the mitigation would be to install an admission check (webhook or Validating Admission Policy) to reject pods that try to use this.

Logs from the pod will show the ownerReferences of the pod:

```bash
$ kubectl logs kubernetes-downwardapi-volume-example
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is under-specified.

I don't know if we should dump the WHOLE object, but it's probably fine - can you think of any reason to hide any of these?

YAML is a terrible file format for parsing - I think it would be better to spec this as something like:

"""
The file representation of this fieldref is a JSON list of objects, each of which is a serialized k8s.io/apimachinery/pkg/apis/meta/v1.OwnerReference.
"""

Open question: Should we instead produce a metav1.List? e.g.

{
    "kind": "OwnerReference",
    "apiVersion": "meta/v1",
    "items": [
        { .... }, { ... }
    ]
}

IF there was ever a meta/v2, we would need to allow people to request that - does the output need to be self-documenting? I think probably not.


If `ownerReferences` is empty or not set, the file will be empty.

Is out of the scope of this KEP to add the `ownerReferences` to the pod's environment variables.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? It's always a smell when we do one but not the other. This value could be specced the same as the file - a JSON list.


- <test>: <link to test coverage>

### Graduation Criteria
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an API change - a value that used to be rejected is now allowed, so in the event of a rollback, we need to know it is still handled. Ergo "alpha" is required.

Comment on lines +375 to +376
- Feature gate name: `DownwardAPIOwnerReferences`
- Components depending on the feature gate: `kubelet`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want all new changes to be introduced with a feature gate?

If you think your change could break a cluster, then YES.

If you your change has any aspect of API, then YES.

new optimizations to the watch cache

Could absolutely break a cluster!

adding new fields to the status of a resource

Any API field we add goes through a gate so that we can safely do a rollback without breaking or dropping information.

adding new entry to resource annotation

If this produces a value which version x-1 of Kubernetes would reject, then it needs a gate. Always think about this case:

User upgrades cluster from x->y
User uses your new API field.
Uh oh! Something bad happened.
User rolls cluster back to x.

Your API field is still stored in etcd, and may have some lingering external effect - what will happen? What if the user tries to update the object - will it fail validation? Or will the new field just disappear?


###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

No, the missing Downward API field will be perceived as a missing Volume preventing the pod from starting.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, wouldn't it be seen as an unknown field ref?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Status: Needs Reviewer
Development

Successfully merging this pull request may close these issues.

7 participants