How do diagnose a high load average?

怎样诊断系统高负载问题?

1. Uptime - 查看系统平均负载

Load是指在指定时间段内等待运行队列中等待进程的数量,时间段如下:

load average: 1min, 5min, 15min

Typically, a server with a high load average is unresponsive and slow — and you want to reduce the load and increase responsiveness. But how do you go about working out what is causing your high load?

2. 造成负载过高的原因

当一个线程无法获得以下资源时,就会进行等待。

CPU
I/O:Disk 或 Network

当CPU不够强大或因为某种问题存在IO阻塞时,系统负载就会过高。

3. 定位问题的一般步骤

3.1. Top - 对CPU进行判断

Lets start with the simplest one, are we waiting for CPU? Run the Linux command top.

-bash-3.2$top

top - 17:27:17 up 42 days,  6:09,  5 users,  load average: 0.00, 0.00, 0.00
Tasks: 117 total,   1 running, 116 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.2%us,  0.2%sy,  0.0%ni, 97.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2049908k total,  1025852k used,  1024056k free,   140456k buffers
Swap:  4095992k total,  1104928k used,  2991064k free,   470908k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
16010 root      16   0 96952 3880 2472 S  4.0  0.2   0:09.05 vim       

3.1.1. Cpu使用

Cpu(s):  2.2%us,  0.2%sy

用户和系统,超过99%是高

3.1.2. Cpu的IO等待

0.0%wa

io wait,超过80%是高。意味着硬盘或网络异常,或者应用访问磁盘的频率过于频繁。

3.2. ps faux - 定位异常程序

-bash-3.2$ ps faux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  10348   364 ?        Ss   Oct28   0:03 init [5]      
root         2  0.0  0.0      0     0 ?        S<   Oct28   0:00 [migration/0]
root         3  0.0  0.0      0     0 ?        SN   Oct28   0:00 [ksoftirqd/0]
root         4  0.0  0.0      0     0 ?        S<   Oct28   0:00 [watchdog/0]
root         5  0.0  0.0      0     0 ?        S<   Oct28   2:46 [migration/1]
root         6  0.0  0.0      0     0 ?        SN   Oct28   0:30 [ksoftirqd/1]
root         7  0.0  0.0      0     0 ?        S<   Oct28   0:00 [watchdog/1]

检查STAT栏,有这样几类标记:

R – Running
S – Sleeping
D – Waiting for something

跟踪带D标记的程序

4. Reference

MainWiki: Performance_Diagnose (last edited 2011-12-09 00:28:25 by twotwo)