Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in the Akri-on-Krustlet demo #4

Open
rupipal opened this issue Dec 17, 2021 · 10 comments
Open

Errors in the Akri-on-Krustlet demo #4

rupipal opened this issue Dec 17, 2021 · 10 comments

Comments

@rupipal
Copy link

rupipal commented Dec 17, 2021

Hi,
Though I'd have liked to check the demo ( https://github.com/project-akri/akri-on-krustlet/blob/main/demo-krustlet.md ) on k3d ( ealier I could install Akri on k3d without any major issues ( project-akri/akri#438 ), I faced errors in installing Kruslet node itself. Maybe that needs to be taken up with Krustlet people. So I switched to kind. I can see the kruslet-wasi node in the cluster. However, I seem to have hit some error.

/akri$ RUST_LOG=info RUST_BACKTRACE=1 KUBECONFIG=/.kube/config DISCOVERY_HANDLERS_DIRECTORY=~/akri AGENT_NODE_NAME=krustlet-wasi HOST_CRICTL_PATH=/usr/local/bin/crictl HOST_RUNTIME_ENDPOINT=/usr/local/bin/containerd HOST_IMAGE_ENDPOINT=/usr/local/bin/containerd target/release/agent
akri.sh Agent start

akri.sh KUBERNETES_PORT found ... env_logger::init
[2021-12-17T13:05:31Z INFO akri_shared::akri::metrics] starting metrics server on port 8080 at /metrics
[2021-12-17T13:05:31Z INFO agent::util::registration] internal_run_registration_server - entered
[2021-12-17T13:05:31Z INFO agent::util::config_action] do_config_watch - enter
[2021-12-17T13:05:31Z INFO warp::server] Server::run; addr=0.0.0.0:8080
[2021-12-17T13:05:31Z INFO warp::server] listening on http://0.0.0.0:8080
[2021-12-17T13:05:31Z WARN kube::client] Unsuccessful data error parse: 404 page not found

thread 'tokio-runtime-worker' panicked at 'called Result::unwrap() on an Err value: "404 page not found\n": Failed to parse error data', agent/src/main.rs:88:14
stack backtrace:
0: rust_begin_unwind
at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/std/src/panicking.rs:517:5
1: core::panicking::panic_fmt
at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/core/src/panicking.rs:101:14
2: core::result::unwrap_failed
at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/core/src/result.rs:1617:5
3: <core::future::from_generator::GenFuture as core::future::future::Future>::poll
4: tokio::runtime::task::harness::Harness<T,S>::poll
5: std::thread::local::LocalKey::with
6: tokio::runtime::thread_pool::worker::Context::run_task
7: tokio::runtime::thread_pool::worker::Context::run
8: tokio::macros::scoped_tls::ScopedKey::set
9: tokio::runtime::thread_pool::worker::run
10: tokio::loom::std::unsafe_cell::UnsafeCell::with_mut
11: tokio::runtime::task::harness::Harness<T,S>::poll
12: tokio::runtime::blocking::pool::Inner::run
note: Some details are omitted, run with RUST_BACKTRACE=full for a verbose backtrace.
Error: JoinError::Panic(...)

The step previous to this showed:
~/akri$ cargo build -p agent --release
Updating git repository https://github.com/kate-goldenring/h2
Updating git repository https://github.com/DazWilkin/openapi-admission-v1
Downloaded crypto-mac v0.8.0
Downloaded darling v0.12.4
Downloaded float-cmp v0.8.0
...
...
...

Compiling kube-runtime v0.59.0
Compiling akri-shared v0.7.11 (/home/rupinder/akri/shared)
warning: irrefutable while let pattern
--> discovery-utils/src/discovery/mod.rs:231:27
|
231 | while let item = uds.accept().map_ok(|(st, _)| unix_stream::UnixStream(st)).await {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: #[warn(irrefutable_let_patterns)] on by default
= note: this pattern will always match, so the loop will never exit
= help: consider instead using a loop { ... } with a let inside it

Compiling akri-debug-echo v0.7.11 (/home/rupinder/akri/discovery-handlers/debug-echo)
warning: akri-discovery-utils (lib) generated 1 warning
warning: irrefutable while let pattern
--> agent/src/util/registration.rs:189:19
|
189 | while let item = uds.accept().map_ok(|(st, _)| unix_stream::UnixStream(st)).await {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: #[warn(irrefutable_let_patterns)] on by default
= note: this pattern will always match, so the loop will never exit
= help: consider instead using a loop { ... } with a let inside it

warning: irrefutable while let pattern
--> agent/src/util/device_plugin_builder.rs:143:27
|
143 | while let item = uds.accept().map_ok(|(st, _)| unix_stream::UnixStream(st)).await {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: this pattern will always match, so the loop will never exit
= help: consider instead using a loop { ... } with a let inside it

warning: agent (bin "agent") generated 2 warnings
Finished release [optimized] target(s) in 1m 46s

regards
Rupinder

@kate-goldenring
Copy link
Collaborator

@rupipal great to hear you are trying it out. Based on where the panic occurred, it looks like it is having difficulties finding the Akri configuration CRD. Just to double check, did you install Akri and the Controller in the previous step? You can confirm that the configuration CRD has been applied to the cluster via helm with kubectl get crd configurations.akri.sh. I was able to reproduce the error after deleting the Configuration crd.

@rupipal
Copy link
Author

rupipal commented Dec 18, 2021

Hi @kate-goldenring ,
Thanks for your reply. Yes, I think I missed that step as a slip.

Now this was my kind cluster (cluster-1) to begin with.

$ kubectl get pods -A

NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-558bd4d5db-6vgxt 1/1 Running 0 5m23s
kube-system coredns-558bd4d5db-sghvw 1/1 Running 0 5m23s
kube-system etcd-cluster1-control-plane 1/1 Running 0 5m27s
kube-system kindnet-87dsk 1/1 Running 0 5m24s
kube-system kube-apiserver-cluster1-control-plane 1/1 Running 0 5m27s
kube-system kube-controller-manager-cluster1-control-plane 1/1 Running 0 5m27s
kube-system kube-proxy-xhwtk 1/1 Running 0 5m24s
kube-system kube-scheduler-cluster1-control-plane 1/1 Running 0 5m27s
local-path-storage local-path-provisioner-547f784dff-xhjzh 1/1 Running 0 5m23s

Upon starting the Krustlet node, this is what I got.

$ kubectl get pods -A

NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-558bd4d5db-6vgxt 1/1 Running 0 18m
kube-system coredns-558bd4d5db-sghvw 1/1 Running 0 18m
kube-system etcd-cluster1-control-plane 1/1 Running 0 18m
kube-system kindnet-87dsk 0/1 CrashLoopBackOff 6 18m
kube-system kindnet-pt888 0/1 Registered 0 10m
kube-system kube-apiserver-cluster1-control-plane 1/1 Running 0 18m
kube-system kube-controller-manager-cluster1-control-plane 1/1 Running 0 18m
kube-system kube-proxy-xhwtk 1/1 Running 0 18m
kube-system kube-scheduler-cluster1-control-plane 1/1 Running 0 18m
local-path-storage local-path-provisioner-547f784dff-xhjzh 1/1 Running 0 18m

The Krustlet node was deployed.

$ kubectl get no
NAME STATUS ROLES AGE VERSION
cluster1-control-plane Ready control-plane,master 85m v1.21.1
krustlet-wasi Ready 77m 1.0.0-alpha.1

The Arkr controller gets deployed too.

$ kubectl get pods -A

NAMESPACE NAME READY STATUS RESTARTS AGE
default akri-controller-deployment-776897c88f-464gg 1/1 Running 0 42m
kube-system coredns-558bd4d5db-6vgxt 1/1 Running 0 63m
kube-system coredns-558bd4d5db-sghvw 1/1 Running 0 63m
kube-system etcd-cluster1-control-plane 1/1 Running 0 63m
kube-system kindnet-87dsk 0/1 CrashLoopBackOff 15 63m
kube-system kindnet-pt888 0/1 Registered 0 55m
kube-system kube-apiserver-cluster1-control-plane 1/1 Running 0 63m
kube-system kube-controller-manager-cluster1-control-plane 1/1 Running 0 63m
kube-system kube-proxy-xhwtk 1/1 Running 0 63m
kube-system kube-scheduler-cluster1-control-plane 1/1 Running 0 63m
local-path-storage local-path-provisioner-547f784dff-xhjzh 1/1 Running 0 63m

The gRPC proxy successfully connects with the Akri Agent and the input file seems to be written.

[2021-12-18T16:19:12Z INFO dh_grpc_proxy] gRPC proxy running named as: debugEcho!
[2021-12-18T16:19:12Z INFO dh_grpc_proxy] Turning the server on!
[2021-12-18T16:19:12Z INFO akri_discovery_utils::registration_client] register_discovery_handler - entered
[2021-12-18T16:19:12Z INFO akri_discovery_utils::discovery::server] internal_run_discovery_server - entered
[2021-12-18T16:22:29Z INFO dh_grpc_proxy::discovery_handler] Connection established!
[2021-12-18T16:22:29Z INFO dh_grpc_proxy::discovery_handler] Input file written: {"descriptions":["foo0"]}

However, besides those two kindnet pods not coming up, the broker Wasm Pod doesn't come up either.

$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default akri-controller-deployment-776897c88f-464gg 1/1 Running 0 72m
default wasi-debug-echo 0/1 Registered 0 25m
kube-system coredns-558bd4d5db-6vgxt 1/1 Running 0 93m
kube-system coredns-558bd4d5db-sghvw 1/1 Running 0 93m
kube-system etcd-cluster1-control-plane 1/1 Running 0 93m
kube-system kindnet-87dsk 0/1 CrashLoopBackOff 21 93m
kube-system kindnet-pt888 0/1 Registered 0 86m
kube-system kube-apiserver-cluster1-control-plane 1/1 Running 0 93m
kube-system kube-controller-manager-cluster1-control-plane 1/1 Running 0 93m
kube-system kube-proxy-xhwtk 1/1 Running 0 93m
kube-system kube-scheduler-cluster1-control-plane 1/1 Running 0 93m
local-path-storage local-path-provisioner-547f784dff-xhjzh 1/1 Running 0 93m

Did spent a lot of time to look out for any missing steps. Here is where I am now :)

Rupinder

@kate-goldenring
Copy link
Collaborator

Hi @rupipal, that's definitely a lot of progress. Did you deploy the debug echo discovery handler yaml from this step. Your flow above is very descriptive and that step isnt in there, so I just wanted to check. I would at least expect erroring pods, since all that step is doing is deploying a standard Kubernetes Pod.

@rupipal
Copy link
Author

rupipal commented Dec 20, 2021

Hi @kate-goldenring

Yes, defintely. That's what causes (wasi-debug-echo 0/1 Registered 0 25m ) to show up. So I'm trying to figure out what would be the id for
kubectl describe pod krustlet-wasi-akri-debug-echo--pod

But even at 25m, it doesn't run.

@kate-goldenring
Copy link
Collaborator

@rupipal do the logs of the agent show any issue creating the device plugins? Maybe an issue around creating a socket? The Agent may need to be run privileged

@rupipal
Copy link
Author

rupipal commented Dec 27, 2021

Hi @kate-goldenring

Here are the logs of the agent. They don't seem to show any issue. I tried with sudo; if I recall correctly, it starts looking for Kubeconfig in the root and that it can't find there.

$ RUST_LOG=info RUST_BACKTRACE=1 KUBECONFIG=/.kube/config DISCOVERY_HANDLERS_DIRECTORY=~/akri AGENT_NODE_NAME=krustlet-wasi HOST_CRICTL_PATH=/usr/local/bin/crictl HOST_RUNTIME_ENDPOINT=/usr/local/bin/containerd HOST_IMAGE_ENDPOINT=/usr/local/bin/containerd ./akri/target/release/agent
akri.sh Agent start
akri.sh KUBERNETES_PORT found ... env_logger::init
[2021-12-25T05:14:21Z INFO akri_shared::akri::metrics] starting metrics server on port 8080 at /metrics
[2021-12-25T05:14:21Z INFO agent::util::registration] internal_run_registration_server - entered
[2021-12-25T05:14:21Z INFO agent::util::config_action] do_config_watch - enter
[2021-12-25T05:14:21Z INFO warp::server] Server::run; addr=0.0.0.0:8080
[2021-12-25T05:14:21Z INFO warp::server] listening on http://0.0.0.0:8080
[2021-12-25T05:14:21Z INFO agent::util::config_action] handle_config - watcher started
[2021-12-25T05:22:09Z INFO agent::util::registration] register_discovery_handler - called with register request RegisterDiscoveryHandlerRequest { name: "debugEcho", endpoint: "/home/rupinder/akri/debugEcho.sock", endpoint_type: Uds, shared: true }
[2021-12-25T05:24:25Z INFO agent::util::config_action] handle_config - added or modified Configuration Some("akri-debug-echo")
[2021-12-25T05:24:25Z INFO agent::util::discovery_operator::start_discovery] start_discovery - entered for debugEcho discovery handler

Here are the logs of the gRPC proxy.

/akri-on-krustlet$ RUST_LOG=info DISCOVERY_HANDLER_NAME=debugEcho DISCOVERY_HANDLERS_DIRECTORY=/akri AGENT_NODE_NAME=krustlet-wasi ./target/release/dh-grpc-proxy
[2021-12-25T05:22:09Z INFO dh_grpc_proxy] gRPC proxy running named as: debugEcho!
[2021-12-25T05:22:09Z INFO dh_grpc_proxy] Turning the server on!
[2021-12-25T05:22:09Z INFO akri_discovery_utils::registration_client] register_discovery_handler - entered
[2021-12-25T05:22:09Z INFO akri_discovery_utils::discovery::server] internal_run_discovery_server - entered
[2021-12-25T05:24:25Z INFO dh_grpc_proxy::discovery_handler] Connection established!
[2021-12-25T05:24:25Z INFO dh_grpc_proxy::discovery_handler] Input file written: {"descriptions":["foo0"]}

@kate-goldenring
Copy link
Collaborator

looks like the proxy and agent are running correctly. The Wasm debug echo discovery handler is not correctly reading the input file and writing to the output file. Can you share the logs of the debug echo discovery handler that was deployed in this step?

@rupipal
Copy link
Author

rupipal commented Feb 3, 2022

Hi @kate-goldenring
Sorry for the long delay in replying; I was unwell.
I re-did all the steps.
As I mentioned earlier, wasi debug echo discovery handler pod doesn't start running and isn't ready.

kubectl get akrii,pods -o wide

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/akri-controller-deployment-776897c88f-f9wqh 1/1 Running 0 146m 10.244.0.5 cluster1-control-plane
pod/wasi-debug-echo 0/1 Registered 0 118m krustlet-wasi

@kate-goldenring
Copy link
Collaborator

Commenting here to revive this investigation. I will be unavailable for the next couple weeks but i will see if i can find a slot of time to rerun the demo and possibly repro the issue. @rodz in case you have time to debug

@rupipal
Copy link
Author

rupipal commented May 10, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants