Start kubernetes pod memory depending on size of data job

Start kubernetes pod memory depending on size of data job

is there a way to scale dynamically the memory size of Pod based on size of data job (my use case)?

Currently we have Job and Pods that are defined with memory amounts, but we wouldn't know how big the data will be for a given time-slice (sometimes 1000 rows, sometimes 100,000 rows).
So it will break if the data is bigger than the memory we have allocated beforehand.

I have thought of using slices by data volume, i.e. cut by every 10,000 rows, we will know memory requirement of processing a fixed amount of rows. But we are trying to aggregate by time hence the need for time-slice.

Or any other solutions, like Spark on kubernetes?

Another way of looking at it:
How can we do an implementation of Cloud Dataflow in Kubernetes on AWS

2 Answers
2

It's a best practice always define resources in your container definition, in particular:

resources

limits

requests

This allows the scheduler to take a better decision and it eases the assignment of Quality of Service (QoS) for each pod (https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/) which falls into three possible classes:

The QoS enables a criterion for killing pods when the system is overcommited.

I have found the partial solution to this.
Note there are 2 parts to this problem.
1. Make the Pod request the correct amount of memory depending on size of data job
2. Ensure that this Pod can find a Node to run on.

The Kubernetes Cluster Autoscaler (CA) can solve part 2.
https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler

According to the readme:
Cluster Autoscaler is a tool that automatically adjusts the size of the Kubernetes cluster when there are pods that failed to run in the cluster due to insufficient resources.

Thus if there is a data job that needs more memory than available in the currently running nodes, it will start a new node by increasing the size of a node group.
Details:
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md

I am still unsure how to do point 1.

An alternative to point 1, start the container without specific memory request or limit:
https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/#if-you-don-t-specify-a-memory-limit

If you don’t specify a memory limit for a Container, then one of these
situations applies:

The Container has no upper bound on the amount of memory it uses. or The Container could use all of the memory available on the Node where it is running.

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

bvA7DVAuGBMzNvHn1 V,c9jFHj4TLO92TP0yUVxX

搜尋此網誌

Gtjkyu