115. OOM(内存不足),高内存消耗,基本故障排除步骤

张开发
2026/4/7 16:20:09 15 分钟阅读

分享文章

115. OOM(内存不足),高内存消耗,基本故障排除步骤
Situation 地理位置Memory consumption on the nodes is too high, or OOM kill is happening frequently.节点内存消耗过高或者 OOM 杀机频繁发生。At the Kubernetes level在 Kubernetes 层面Start with kubectl top as it should tell what is consuming memory at the point in time:先从 kubectl top 开始因为它应该能显示当前占用内存的因素span stylecolor:#000000span stylebackground-color:#ffffffspan stylebackground-color:#efefefcode# check which pods are consuming most memory kubectl top pods # check which nodes are affected kubectl top nodes/code/span/span/spanA few questions that can help are:有几个问题可以帮助你Which pods are consuming the most resources?哪些舱体消耗最多资源Is it on a specific node, or across all nodes?是在某个特定节点还是跨所有节点Describing the node, is it over-provisioned?描述节点是不是过度配置This might give opportunities for better capacity planning for your applications.这可能为你的应用提供更好的容量规划机会。At the node level在节点层面Check the messages (or with dmesg -T) for the OOM Kill message:请查看消息或用 dmesg -T来确认 OOM 杀死消息If invoked by cgroup, it means that limits are being respected. Adjust them as needed.如果被 cgroup 调用表示限制被尊重。根据需要调整它们。If invoked by the kernel, it means that the node is running out of memory and OOM is reclaiming it如果被内核调用意味着节点内存快用完了OOM 正在回收Check the kubelet logs for OOM kills.查看库贝莱特的日志看看 OOM 击杀数。Resolution 结局Rancher Project Resource Quotas:牧场主项目资源配额Rancher allows for resource management at the Project level. Please review the documentation on how to set limits at the Project and Namespace levels.Rancher 允许在项目层面进行资源管理。请查阅关于如何在项目和命名空间层面设置限制的文档 。For non-Rancher components:对于非牧场主组件Adjust the requests and limits as per the Kubernetes documentation. It can be done at many levels. At spec.container, or even on the values.yaml. Here is an example from Rancher Monitoring:根据 Kubernetes 文档调整请求和限制。它可以在多个层面进行。在 spec.container甚至在 values.yaml 上。这里有一个来自牧场主监测的例子span stylecolor:#000000span stylebackground-color:#ffffffspan stylebackground-color:#efefefcoderesources: limits: memory: 500Mi cpu: 1000m requests: memory: 100Mi cpu: 100m/code/span/span/spanIf you are experiencing issues with Rancher-shipped components, open a case with Rancher Support. Please collect all the data below when contacting SUSE Rancher support.如果您在使用 Rancher 出货的组件时遇到问题请向 Rancher 支持部门提交申诉。联系 SUSE 牧场支持时请收集以下所有数据。kubectl top podsKubectl 顶层烟囊kubectl top nodes Kubectl 顶端节点Grafana Graphs of the affected services, or graphs from any monitoring in place受影响服务的 Grafana 图或任何监控的图表The log bundle: https://www.suse.com/support/kb/doc/?id000020191对数丛https://www.suse.com/support/kb/doc/?id000020191The resource count of Rancher: https://www.suse.com/support/kb/doc/?id000021310牧场主的资源数量https://www.suse.com/support/kb/doc/?id000021310You might be asked by support to also collect profiles of Rancher or Fleet: https://www.suse.com/support/kb/doc/?id000021615客服可能会要求你收集 Rancher 或 Fleet https://www.suse.com/support/kb/doc/?id000021615 的资料Cause 病因OOM kills or high memory usage might be caused by lack of resources, configuration issues or application failures.OOM 杀机或高内存使用可能由资源不足、配置问题或应用失败引起。访问Rancher-K8S解决方案博主企业合作伙伴 https://blog.csdn.net/lidw2009

更多文章