Week notes: sig-arch KEP reading club community meeting, deep-dive in sig-testing infrastructure, learning OWASP Top ten #26

June 27, 2021

Very quick notes from this previous week:

`sig-arch KEP reading club community meeting`

I followed up with the recording of the recent (and the second) sig-arch KEP reading club community meeting. I couldn’t join it live ( but for someone who want to join, it happens every 2nd week, on Monday from 8:30 – 9:30 PM IST tz. Subscribe to sig-arch mailing list for invite!).

The following KEPs were covered during the session:

KEP-2599: minReadySeconds for StatefulSets

minReadySeconds specifies the minimum number of seconds within which a newly created Pod should be ready without any of its containers going crashing. If that happens successfully, it marks the pod valid to be considered as available. The minReadySeconds defaults to value 0 (which means the pod will be considered available as soon as it is in Ready state).

minReadySeconds is already available as an optional field in these other workload controllers - Deployments, DaemonSets, ReplicasSets and Replication Controllers. Enabling this option in StatefulSets, helps bringing StatefulSets at par with these other workload controllers.

The listed drawbacks are ~
- It adds more complexity to StatefulSet controller in terms of checking Pod availability for a certain amount of time.
- Measuring the availability may be hard as there can be clock skew between Master/node. However, this feature is successfully implemented in Deployment and DaemonSet controllers.
Another argument here were
- “why not move the minReadySeconds field in Pod Spec itself?”
- “What will be the consequences of adding minReadySeconds firstly in StatefulSets workload controller, but moving forward in PodSpec itself?
- What will take precendence amongst or whether the values of minReadySeconds at both StatefulSets level & Pod Spec level will be added?”
There was some discussion on it, and it was clarified that moving minReadySeconds to Pod Spec is beyond the scope of this KEP. The major reasons are listed in the Non-Goals section of the KEP. But in nutshell, the change required for implementing this is too large of an effort.

This KEP discussion brought out more useful points that are currently not reflected in the KEP document, so, it was requested to include them now, further for the purpose of improvement.
KEP-2258: Node service log viewer

The motivation of this KEP is to help k8s cluster administrators, to help efficiently troubleshooting issues with control-plane and worker nodes. At the moment, it typically require a cluster admin to SSH into the nodes for debugging. While it is understood that certain issues will require being on the node, but issues with the kube-proxy or kubelet (to name a couple) could be solved by checking the logs. So, this KEP proposes a feature to add a node capability to view logs using the exising k8s command line tool, kubectl, in order to significntly simply these basic level of node/control-plane troubleshooting.

The meeting had a discussion over understanding various technicalaties behind implementing this feature on linux & windows worker nodes (brief is in the KEP itself), while specifying that it won’t include providing support for non-systemd linux distributions.

There are lots of security considerations, points of confusion & questions at the moment in regard to moving this KEP to a feature implementation.

As discussed, it will be followed up in the upcoming sig-cli biweekly meeting (on coming Wednesday, June 29, 2021 at 9:30PM - 10:00PM IST)

I found both of these feature enhancement proposals really interesting (& useful too in many ways, but ofcourse there are lots of question that might bring blockers or require further iterations of improvement before approval).

`deep-dive in sig-testing infrastructure`

This was a further step in exploring more on the kubernetes testing infrastructure after I learnt about the CI jobs Testgrid.

Over this last week, I went through all of the sig-testing deep dive videos (there were only 3).

I learnt about the lifecycle of a Prow job.

From its different triggering mechanisms,

trigger using /test all like commands in the PR comments,
trigger based on time or periodic job triggeres (Horologium component does that)
trigger based on a PR getting merged, etc.

to different executing platforms,

kubernetes Pod
Knative build CRDs (& more to come)

then followed by hooks,

Github/Gitlab webhooks
Slack webhooks, (& other similar service webhooks)

& then plugins to handle various (incoming) webhooks (or as the docs & the source code term it hooks),

Plugins:trigger to handle trigger Prow Job webhook
Plugins:configUpdate to handle triggeres created by config changes, & so to do config updates automatically in return
Plugins:cat, it add a cat meme in the PR comments, somewhere from internet
(and more could be found here -> prow.k8s.io/command-help)

I then learnt about plank that actually (initiate & does the reconcile & so life managemenet) runs the pods for Prow jobs,

& then sinker that does the garbage collection,

& finally about the different reporting sinks such as GitHub reporter, Gerrit reporter, PubSub reporter, etc (which happens through crier). This is basically reporting prow job status back to the external services that triggered them!

I then also learnt about Pod Utilities that could be added to prow job pods (created/executed by Plank), for the care-taking purpose of basic recurrent utility tasks like source code checkout, logs & artifact upload & so on). I came across this handy migration guide written by Dims for migrating pods from using (the older way of doing things) bootstrap.py to using podUtilities.

I have started looking at & reading the source code for all these various components at the moment. Further steps would be ~

picking a set of jobs that I can migrate (from using bootstrap.py to podUtilities) and do the migration.
and also convert the migration gist (I linked above) to a document and drop it in sig-testing development docs based on my experience doing the migration.

Something to note is, the above learning that I’ve gathered through just watching the sig-testing deep-dive videos might be slightly (or more) dated.

But as of today, I can say,

This is what I’ve learnt the following so far:

Screenshot from 2021-06-27 09-29-30 _{(credits: Deep Dive: Testing SIG - Cole Wagner & Katharine Berry, Google)}

And this is, what it looks like (as of today):

test-infra architecture

Hahahaha, yeah, it looks like too much of a difference. And ofcourse it is, but I’m familier with lots of these new components too, so good for me. 🙂

`learning OWASP Top ten`

And lastly, I had spent some time early this week, doing the first part of the Master the OWASP Top 10 linkedIn learning course track. Will block some more time to finish it later. :)

So far, I just verbally understand what are OWASP Top Ten.

That’s all for this week. I had a wonderful productive work time as well. I’ve contributed to upstream kub-linter project for adding more validation checks. Hopefully, that will be a different blogs.

Till, then o/

cogitationes, labores, et gratiae (thoughts, works, and gratitudes)