Skip to content

pods get oom-killed when multiple heavy spark applications start at the same time #666

@maxgruber19

Description

@maxgruber19

Affected Stackable version

25.3.0

Current and expected behavior

when multiple spark applications get started at the same time, all of them having 300 executors, a lot of pvcs are submitted which need to be satisfied by the secret-operator. in basic configuration the secret operator container has 128mb of memory which seems to be not enough because in that case all the pods get oom-killed

Possible solution

Increase the memory limit of the secret-operator container from 128mb to 1gb

Additional context

@soenkeliebau like mentioned today, I'll try to get further details. I think this is reproducible by submitting a 3000 exec application and keeping an eye on the secret-operator ds

Environment

No response

Would you like to work on fixing this bug?

None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions