>> High but not full CPU utilization (70~90%) 


>> CPU Time spent in user application (85+% in user)


>> Low Disk utilization (5~15% for each disk)


>> Low Network utilization (10~30% per network, less 5% of collision)


 


2. 튜닝의 일반적인 모델











  Agree level of Performance to reach


->


Gather Data using monitoring tool(vmstat, iostat, netstat,…)


->


Analyze Data


->


Work from the biggest bottleneck first


 주의: 시스템의 성능(Performance)은 시스템의 리소스를 어떻게 사용하느냐에 좌우된다.


 3. 튜닝의 일반적인 순서






 


1st. Application Tuning


2nd. DataBase Tuning


3rd. OS Tuning(System Tuning)


 


4. 실행가능한 몇가지 모니터링 툴






 


vmstat : command to view status of memory and CPU


ps : command to find which process are hogs


swap : command for available swap space


iostat : command for terminal, disk, cpu utilization


netstat, nfsstat : command for network performance


sar : command to view system activity (need SUNWaccr, SUNWaccu packages)


mpstat : command to view status of multi-cpu


truss : command to trace system calls what is going on


cachefs : mechanism to speed up read-mostly NFS


PrestoServe : application for synchronous writes (many small files, Mail server)


DiskPak of Eagle company : application for fragmented disks


 


5. 힌트 : 데이터베이스 관리시스템 튜닝






 



  • Configure disk for speed, Not capacity

  • I/O load needs many random access disks

  • 3*1.05GB is over twice as fast as 1*2.9GB

  • Use raw disk for tablespaces to reduce CPU load

  • Save inode and indirect block updates

  • Use dd | compress into filesystem for snap backup then ufsdump normally

  • Use UFS for tablespaces to reduce I/O load

  • Extra level of caching needs more RAM

  • With UFS, use PrestoServe/NVSIMM or Logging option

  • Use large shared memory area (up to 25% of RAM)

  • If uo value is upper 25%, must expand shared memory

 


 


6. 시스템 성능을 좌우하는 요소들






 



  • CPU : number of CPUs

  • I/O Devices : disk, printer, terminal, transfer information

  • Memory : primary memory(RAM), secondary memory(on disk)

  • Kernel : kernel parameters (/etc/system)

  • Network

주의 : 튜닝시 반드시 시스템과 네트웍을 같이 고려해야 한다.


 


 


7. 주의깊게 봐야할 변수들






 






































            


          


 Source Code


 Alogorithm, Language, Programming Model, Compiler


 Executable


 Environment, Filesytem Type


 DataBase


 Buffer Sizes, Indexing


 Kernel


 Buffer Sizes, Paging, Tuning, Configuring


 Memory


 Cache Type, Line Size and Miss Cost


 Disk


 Driver Algorithms, Disk Type, Load Balance


 Windows & Graphics


 Window System, Graphics Library, Bus Throughput


 CPU


 Processor Implementation


 Multiprocessors


 Load Balancing, Concurrency, Bus Throughput


 Network


 Protocol, Hardware, Usage Pattern


 


 8. 첫 번째 시스템 튜닝 10 단계






 


1st. The system will usually have a disk bottleneck.


2nd. You will be told that the system is NOT I/O bound.


3rd. After first pass tuning the system will still have a disk bottleneck.


4th. Poor NFS response times are hard to pin down.


5th. Avoid the common memory usage misconceptions.


6th. Don’t panic when you seee page-ins and page-outs in vmstat.


7th. Look for page scanner activity.


8th. Look for a long run queue (vmstat procs r).


9th. Look for processes blocked waiting for I/O (vmstat proc b).


10th. Look for CPU system time dominating user time.


 


9. 첫 번째 튜닝 접근






 


1st. Clear up any RAM shortage.


     => If at first the monitoring indicates paging, Add more RAM.


2nd. Make sure that processor speed, or number of process.


     => Clear out unnecessary processes.


     => Make sure that the run queue is as small as possible.


3rd. Focus I/O subsystems (disk, networks)


4th. Use “iostat -x” to monitor the disk.


     => Check busy(%b) and service time(svc_t)


5th. Continue to cycle around the tuning path until all subsytems, and indeed the machine itself is fast enough to reach the required perfomance metric.


Analysis using tool & How to read them


 


 


 


 


 


CPU Performance


 


ps command






 


How to use ps


 


#> ps


TIME : the total amount of CPU time used by the process since it began


#> ps -efl


SZ : shows the amount of virtual memory required by the process


 


Example of ps


 


#> ps
 PID TTY       TIME COMD
 346 pts/2     0:01 ksh
1029 pts/2     0:00 ps
1199 pts/2     0:01 ksh
#> ps -efl
 UID  PID PPID  C PRI NI     ADDR    SZ    WCHAN    STIME TTY TIME COMD
root    0    0 80   0 SY f01706f0     0          19:47:44 ?   0:01 sched
root    1    0 80  99 20 fc18f800   173 fc18f9c8 19:47:48 ?   0:36 /etc/init –


  


vmstat command






 How to use vmstat


 


 


#> vmstat 5


procs


r b w


r : In the run queue, waiting for processing time


b : Blocked, waiting for resources


cpu


cs us sy id


us : percentage of CPU time spent in USER mode


sy : percentage of CPU time spent in SYSTEM mode


id : percentage of CPU time spent idle


 


How to read vmstat data


 


If CPU spends most of its time in USER mode, one or more processes may be monopolizing the CPU.


=> Check “ps -ef”


A low value “id” indicates a MEMORY starved or I/O bound system.


=> RECOMMENDATION : idle time “id” should be grater than 20% most of time


=> In GENERAL guide


   id < 15% : have to wait before being put into execution


   us > 70% : the application load may be NEED some BALANCING


   sy = 30% : is a good water mark


Compute Intensive program.


=> One Compute Intensive program can push utilization rate to 100%.


If the CPU is mainly in system mode, then it is probably I/O bound.


 


Example of vmstat


 


#> vmstat
procs   memory          page              disk    faults    cpu
r b w  swap free re mf pi po fr de sr f0 s0 s1 s2 in sy cs us sy id
0 0 0 72836 7688  0  1  5  1  4  0  2  0  0  0  1 16 37 30  1  1 98


주의 : sr 값이 계속 높으면 메모리부족을 의심해 봐야 한다.


 CPU Solutions






 If your CPU is often busy, or


If it often deals with jobs than monopolize the system,


 


=> Be lower the priority of other processes (nice command).


=> Check for runaway processes, or other processes monopolizing the CPU or MEMORY.


   check “ps -ef” (TIME field)


   check “ps -efl” (SZ field)


=> Evaluate your system’s memory usage.


 주의 : 부족한 메모리는 과다한 스와핑(swapping)과 페이징(paging)의 원인이 된다.


 


MEMORY Performance


swap command






 


How to use swap


 


#> swap


blocks : 512 bytes block


free   : 512 bytes block


#> swap -s


How much swap space is available


 


 


Example of ps


 


#> swap -l
swapfile          dev   swaplo blocks  free
/dev/dsk/c0t3d0s1 32,25      8 156232 95200
#> swap -s
total: 33092k bytes allocated + 9104k reserved = 42196k used, 53172k available


 


 


 


vmstat -S command






 How to use vmstat -S


 


#> vmstat -S 5


po : kbytes paged out


pi : kbytes paged in


si : number of pages swapped in per second


so : number of whole processes swapped out


 


 


 


How to read vmstat -S output


  po = 0 : no paging occuring


Note 1. check “proc r” field


r > 1     : indicative of processor speed


r > 2 – 4 : adding more CPU


Note 2. check “proc b” field


If this column has values, run “iostat” to tune the disk I/O.


po > 0 : not sufficient RAM for the application


Consult the application vendor regarding the memory requirements


pi rate is NOT important


 주의 : 메모리와 디스크의 액세스 시간을 구별하여야 한다.


 Example of vmstat -S


 


#> vmstat -S 5
procs   memory          page              disk    faults    cpu
r b w  swap free si so pi po fr de sr f0 s0 s1 s2 in sy cs us sy id
0 0 0 53036 2848  0  0 15  8 19  0 11  0  0  0  0 52 37 92  4  3 93
0 0 0 53140 6328  0  1  5  0  0  0  0  0  0  0  0 10 21 30  1  1 98
0 0 0 53140 6328  0  0  0  0  0  0  0  0  0  0  0 31 26 62  0  1 99


 


 


 


 


sar -g command






 


How to use sar -g


 


#> sar -g 5 20


pgout/s : The average number of page-out requests per second


<= A good indication of memory performance


pgfree/s : The average number of pages per second that were added to the free list


pgscan/s : The average number of pages that needed to be scanned in order to find more memory


 


 


How to read sar data


 


consistently pgout = 0 : NO memory problem


several interval pgout > 0 : System perfomance is suffering


pgfree and pgscan should be small (less than 5)


 


 


Example of sar -g


 


#> sar -g 5 20

SunOS hostname 5.5.1 Generic_103640-12 sun4u 11/12/98

16:59:36 pgout/s ppgout/s pgfree/s pgscan/s %ufs_ipf
16:59:41    1.39     5.78    23.90    79.88     0.00
16:59:46    1.60   113.77    27.74   253.69     0.00
16:59:51    0.80     2.00    13.80    81.40     0.00
16:59:56    0.20     0.20     0.20     0.00     0.00


 


Memory Solutions






 


1st. Two Memory Problem


=> The system spend a lot of time paging and/or swapping.


=> Run out of SWAP space


2nd. Check SZ field from “ps -efl”


3rd. For 1st. problem;


=> Adding physical memory until no paging and swapping


 


 


 


 


 


DISK Performance


 


Factors for Disk Performance






 



  • Speed of disk


    • transfer rate

    • seek time

    • rotation latency

  • Load balance across the multiple disk

  • Access type


    • single user vs multi user

    • sequential vs random

  • Memory


    • disk buffers, used when transfering information to and from disk, are stored

df -k command






 How to use df


 


#> df


capacity : How much of the file system’s total capacity has been used


 How to read df output


 


%capacity = 100%


=> remove core and any s/w packages


=> add more disk space or move files to another partition


 


Example of df


 


#> df -k
Filesytem         kbytes   used avail capacity Mounted on
/dev/dsk/c0t3d0s0  21615  14909  4546      77% /
/dev/dsk/c0t3d0s6 240463 211348  5075      98% /usr
/proc                  0      0     0       0% /proc
 


 


iostat command






 


How to use iostat


 


#> iostat 5


disk


serv : average service time, in milliseconds


cpu


us : time spent in user mode


sy : time spent in system mode


wt : time spent waiting I/O


#> iostat -x 5


svc_t : service time


w%    : percentage of time the queue is not empty


%b    : percentage of time the disk busy


 


How to read iostat -x output


 


%b


5%  > %b : ignore


30% < %b : be concerned about anything


60% < %b : need fixing


svc_t


if %b < 5%, svc_t : ignore


if %w has values, 10 – 50ms : OK


                  100 – 150 : Need fixing


 


 


 


 


 


iostat -D command






 


How to use iostat -D


 


#> iostat -D 5


util : percentage of disk utilization


We can find the load balance between disks.


 


Example of iostat


 


#> iostat 5
     tty          fd0          sd0          sd1          sd2         cpu
tin tout Kps tps serv Kps tps serv Kps tps serv Kps tps serv us sy wt id
  0  560   0   0    0   0   0   99   1   0   49  10   2   54  4  4  4 88
  0   16   0   0    0   0   0    0   0   0    0   0   0    0  0  0  0 99

#> iostat -x 5
                    extended disk statistics
disk r/s w/s Kr/s Kw/s wait actv svc_t %w %b
fd0  0.0 0.0  0.0  0.0  0.0  0.0   0.0  0  0
sd0  0.0 0.0  0.2  0.1  0.0  0.0  99.5  0  0


 


 



#> iostat -D
         fd0          sd0          sd1          sd2
rps wps util rps wps util rps wps util rps wps util
  0   0  0.0   0   0  0.2   0   0  0.4   1   0  3.1


 


sar -a -b -d command






 


How to use sar


 


#> sar -a 5 3


-a : Report use of file access system routines (report on file access)


iget/s


namei/s


dirbk/s


Average


#> sar -b 5 3


-b : Report buffer activity (report on disk buffers)


%rcache : Fraction of logical reads found in the system buffers


%wcache : Fraction of logical writes found in the system buffers


 


#> sar -d 5 3


-d : Report activity for each block device (report on disk transfers)


r+w/s : read + write per second


%busy : Percentage of time the device spend servicing a transfer


blk/s : Number of 512-byte blocks transferred to device, per second


 


How to read sar data


 


sar -a


The large the values, the more time the kernel is spending to ACCESS user files


This report is USEFUL for understanding “HOW disk-dependent a system is”


 


sar -b


%rcache < 90% and %wcache < 65%


=> may be possible to improve performance by increasing the buffer space


 


sar -d


%busy > 85% : high utilization, load problem


r+w/s > 65% : overload


 


 


Example of sar output


 


#> sar -a 5 3
16:59:36 iget/s namei/s dirbk/s
16:59:41      0       0       0
16:59:46      8      26      15
16:59:51    271     297     288

Average      93     108     101
#> sar -b 5 3
16:59:36 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s
16:59:41      19      57      67      22      26      17       0       0
16:59:46      23      73      69      18      23      20       0       0
16:59:51      25      51      51      15      27      46       0       0

Average       22      60      63      18      25      28       0       0
#> sar -d 5 3
13:21:07 device %busy avque r+w/s blks/s avwait avserv
13:21:12 sd1       52   0.8    30    216    4.8   21.1
         sd3        6   0.1     1     14    0.0   45.4
average
         sd1       32   0.5    17    187    3.5   23.9
         sd2       45   0.4    24    208    0.0   18.9
         sd3        4   0.0     1      9    0.0   51.1


 


DISK Solutions






 


1st. Check the file system overload (above 90 or 95% capacity)


=> Clean out unused files from /va/adm, /var/adm/sa and /var/lp/logs.


=> Clean out the core files.


=> Find the files what are unused more than 60 days.


   #> find /home -type f pmtime +60 -print


   #> find /home -name core -exec rm {}\;


2nd. If you have more than one disks,


=> Distribute the file systems for a more balanced load between the disks.


=> Try reducing disk seeking through careful planning in data positioning on disk.


   (I/O on the outer sectors can be far faster)


3rd. Consider adding Memory.


=> Additional memory reduces swapping and paging, and allows an expanded buffer pool.


4st. Consider buying faster disks.


5st. Make sure disks are not overloading SCSI controller.


=> Below 60% utilization of SCSI bus.


6st. Consider adding disks.


=> Not be busier more than 40 – 60% of the time.


   (%b : iostat -xct 5 and %busy : sar -d 5 3)


7st. Considering using an in-memory file system for /tmp directory.


ð It’s default in Solaris 2.X.


 


NETWORK Performance


 


Overview of Network Performance






 


Congested or Collision – resend


Network has the bankwidth, and so can only transmit a certain amount of data


EtherNet


10Mbit/sec(14,400 packes/sec)


Size of 1 packet – 64 bytes (about 14,400 packets/sec)


Maximum packet size : 1518 bytes


Inter packet gab : 9.6 micro-second


30-40% utilization because of collision contention


Latency


Not important as much as disk


Must consider that remote system has resources including disk


 


NFS


UDP : Common protocol in use, being part of TCP/IP and allows fast network throughput with little overhead.


Logical packet size : 9Kbytes


On ethernet : 6 * 1518 bytes


After collision, have to resend ALL serveral ethernet packets


Slower remote server


The remote server is CPU bound


Network Monitoring Tool


nfsstat


netstat


snoop


ping


spray


 


 


ping command






 


How to use ping


 


#> ping


Send a packet to a host on the network.


-s : send one packet per second


 


How to read ping -s output


 


2 single SPARCstations on a quiet EtherNet always respond with less than 1 millisecond


 


Example of ping


 


#> ping -s host
PING host: 56 data bytes
64 bytes from host (1.1.1.1): icmp_seq=0. time7. ms
64 bytes from host (1.1.1.1): icmp_seq=0. time7. ms

—-host PING Statistics—-
5 packets transmitted, 5 packets received, 0% packet loss
round-trip (ms) min/avg/max = 1/2/7


 spray command






 


How to use spray


 


#> spray


Send a one-way stream of packets to a host


Reports How may were received and the transfer RATE.


-c : count(number) of packets


-d : specifies the delay, in microseconds


     Default: 9.6 microsecond


-l : specifies the Length(size) of the packet


 


How to read spray output


 


If you use -d option, if there are many packet dropped the packet,


=> Check Hardware such as loose cables or missing termination


=> Check the possible a congested network


=> Use the netstat command to get more information


 


 


Example of spray


 


#> spray -d 20 -c 100 -l 2048 host

sending 100 packets of length 2048 to host …
        no packets dropped by host
        560 packets/sec, 1147576 bytes/sec


 


 


netstat -i command






 


How to use netstat -i


 


#> netstat -i 5


errs : the number of errors


packets : the number of packets


colls : the number of collisions


* Collision percentage rate = colls/output packets * 100


 


How to read netstat data


 


collision percentage > 5% (one system)


=> Checking the network interface and cabling


collision percentage > 5% (all system)


=> The network is congested


errs field has data


=> Suspect BAD hardware generating illegal sized packets


=> Check Repeated Network


 


Example of netstat


 


#> netstat -i 5
    input   le0      output         input  (Total)   output
packets errs packets errs colls packets errs packets errs colls
71853   1    27270   8    4839  72526   1    27943   8    4839
7       0    0       0    0     7       0    0       0    0   
14      0    0       0    0     14      0    0       0    0   


 


snoop command






 


How to use snoop


 


#> snoop


Capture packets from the network


Display their contents


 


Example of snoop


 #> snoop host1
#> snoop -o filename host1 host2
#> snoop -i filename -t r | more
#> snoop -i filename -p99,108
#> snoop -i filename -v -p101
#> snoop -i filename rpc nfs and host1 and host2


 nfsstat -c command






 


How to use nfsstat -c


 


#> nfsstat -c


Display a summary of servers and client statistics


Can be used to IDENTIFY NFS problems


retrans : Number of remote procedure calls(RPCs) that were retransmitted


badxids : Number of times that a duplicat acknowledgement was received for a single NFS request


timeout : Number of calls that timed out


readlink : Number of reads to symbolic links


 


How to read nfsstat data


 


% of retrans of calls > 5% : maybe network problem


=> Looking for network congestion


=> Looking for overloaded servers


=> Check ethernet interface


high badxid, as well as timeout : remote server slow


=> Increase the time-out period


   #> mount host:/home /home rw,soft,timeout=15 0 0


% of readlink of the calls > 10% : too many symbolic links


 


Example of nfsstat


 


#> nfsstat -c

Client rpc:
calls   badcalls retrans badxid timeout wait newcred timers
13185   0        8       0      8       0    0       50

Client nfs:
calls   badcalls nclget  nclcreate
13147   0        13147   0
null    getattr  setattr root   lookup   readlink read
0  0%   794  6%  10   0% 0  0%  2141 16% 2720 21% 6283 48%
wrcache write    create  remove rename   link     symlink
0  0%   581  4%  33   0% 29  0% 4  0%    0  0%    0  0%
mkdir   rmdir    readdir statf
0  0%   0  0%    539  4% 13 0%


 Network Solutions






 


1st. Consider adding the Prestoserve NFS Write accelerator.


=> Write % from nfsstat -s > 15%, consider installing it.


2nd. Subneting


=> If your network is congested, consider subnetting.


=> That is collision rate > 5%, subnetting.


3rd. Install the bridge


=> If your network is congested and physical segmentation is NOT possible.


=> Isolate physical segments of a busy network.


4st. Install the local disk into diskless machines.


 


>> Bottlenecks


 


 


I/O Bottleneck


Detection
























         


 sar 필드


 Uneven workload


 r+w/s, avque


 Many threads blocked waiting on I/O


 %wio


 High disk utilization rate


 %busy


 Active disk with no free space


 


 


Solutions






Balance the disk load


Use mmap instead of read and write


Use shared libraries


Put busier filesystems on smaller disks


Organize I/O requests to be more contiguous


Add more/faster disks


Memory Bottleneck


 







Detection



























 I/O Bottleneck        


 sar 필드


 Steady page-out activity


 ppgout/s


 Scan rate is non-zero(page daemon is active)


 pgscan/s


 Swapper is active


 swpot/s,swpq-sz


 Free memory is at or below lotsfree


 freemem,pgfree/s


 Hardware cache misses


 


 


 


 


Solutions






 


Modify process load


Tune paging parameters


Add more memory


Use shared libraries


Use memcntl to use memory more efficiently within application


Analyze locality of reference in applications 


Set memory limits – setrlimit


 


CPU Bottleneck


 

















       


 sar 필드


 CPU idle time is low


 %usr,%sys,%idle


 Threads waiting on run queue


 runq-sz,%runocc


 Slower response/interactive performance


 


 Solutions






 


Use priocntl / nice to modify process/thread priorities


Modify dispatch parameter tables


Modify applications to use system calls more efficiently


System daemons


Device interrupts


Modify/limit process load


Custom device drivers


More, faster CPUs


>> Rules Table


 


Network Rules






 


Notation used in tables






Rules


측정결과를 나타내기 위해 명령어 이름과 “.”와 변수 이름을 조합하여 표시하였다. 예를 들어 “iostat -x” 명령을 사용하여 디스크 서비스 타임을 30초 간격으로 측정하였다면 이름을 “iostat-x30.svc_t”와 같이 표시하였다.


변수들간의 조합은 논리연산자 “&&”, “||”, “==”를 사용하였고 간결하게 하기 위해 범위는 “0 <= X < 100” 와 같이 표기하였다.


Levels


테이블의 level은 상태의 심각함 정도를 나타내며 아래표와 같다.


 


























Level


 Discription


white


 low usage


blue


 under-utilization/imbalance of resource


green


 target utilization/no problem


amber


 warning level


red


 critical level that needs to be fixed


black


 problems that can pervent your system


 


Actions


각 테이블의 rules에서 취해져야할 조치를 표기했다. 문제에 대한 간단한 메모와 관련사항들을 나타내고 있다.


  


Rules based upon ethernet collizions






 

































Rule for each network interface


Level


Action


 (0<netstat-i30.output.packets<10)&&(100*netstat-i30.


 output.colls/netstat-i30.output.packets<0.5%)&&(other


 nets white or green)


White


No Problem


 (0<netstat-i30.output.packets<10)&&(100*netstat-i30.


 output.colls/netstat-i30.output.packets<0.5%)&&(other


 nets amber or red)


Blue


Inactive Net


 (10<=netstat-i30.output.packets)&&(0.5%<=100*netstat-


 i30.output.colls/netsat-i30.output.packets<2.0%)


Green


No Problem


 (10<=netstat-i30.output.packets)&&(2.0%<=100*netstat-


 i30.output.colls/netstat-i30.output.packets<5.0%)


Amber


Busy Net


 (10<=netstat-i30.output.packets)&&(5.0%<=100*


 netstat-i30.output.colls/netstat-i30.output.packets)


Red


Busy Net


 network type is not “ie”,”le”,”ne”,or “qe”, it is “bf” or “nf”.


Green


Not Ether


 


Inactive Net


An inactive network is a waste of throughput when other networks are overloaded. Rebalance the load so that all networks are used more evenly.


Busy Net


A network with too many collisions reduces throughput and increases response time for users. Move some of the load to inactive networks if there are any. Add more thernets or upgrade to a faster interface type like FDDI, 100MBit ethernet or ATM.


Not Ether


If the last letter of the interface name is not “e” then this not an ethernet so the collision based network performance rule should not be used.


Network Rules






 


Notation used in tables






Rules


측정결과를 나타내기 위해 명령어 이름과 “.”와 변수 이름을 조합하여 표시하였다. 예를 들어 “iostat -x” 명령을 사용하여 디스크 서비스 타임을 30초 간격으로 측정하였다면 이름을 “iostat-x30.svc_t”와 같이 표시하였다.


변수들간의 조합은 논리연산자 “&&”, “||”, “==”를 사용하였고 간결하게 하기 위해 범위는 “0 <= X < 100” 와 같이 표기하였다.


Levels


테이블의 level은 상태의 심각함 정도를 나타내며 아래표와 같다.


 


























Level


 Discription


white


 low usage


blue


 under-utilization/imbalance of resource


green


 target utilization/no problem


amber


 warning level


red


 critical level that needs to be fixed


black


 problems that can pervent your system


 


Actions


각 테이블의 rules에서 취해져야할 조치를 표기했다. 문제에 대한 간단한 메모와 관련사항들을 나타내고 있다.


 


Rules based upon ethernet collizions





































  Rule for each network interface


Level


Action


 (0<netstat-i30.output.packets<10)&&(100*netstat-i30.


 output.colls/netstat-i30.output.packets<0.5%)&&(other


 nets white or green)


White


No Problem


 (0<netstat-i30.output.packets<10)&&(100*netstat-i30.


 output.colls/netstat-i30.output.packets<0.5%)&&(other


 nets amber or red)


Blue


Inactive Net


 (10<=netstat-i30.output.packets)&&(0.5%<=100*netstat-


 i30.output.colls/netsat-i30.output.packets<2.0%)


Green


No Problem


 (10<=netstat-i30.output.packets)&&(2.0%<=100*netstat-


 i30.output.colls/netstat-i30.output.packets<5.0%)


Amber


Busy Net


 (10<=netstat-i30.output.packets)&&(5.0%<=100*


 netstat-i30.output.colls/netstat-i30.output.packets)


Red


Busy Net


 network type is not “ie”,”le”,”ne”,or “qe”, it is “bf” or “nf”.


Green


Not Ether


 


Inactive Net


An inactive network is a waste of throughput when other networks are overloaded. Rebalance the load so that all networks are used more evenly.


Busy Net


A network with too many collisions reduces throughput and increases response time for users. Move some of the load to inactive networks if there are any. Add more thernets or upgrade to a faster interface type like FDDI, 100MBit ethernet or ATM.


Not Ether


If the last letter of the interface name is not “e” then this not an ethernet so the collision based network performance rule should not be used.


 


CPU Rules






 


Notation used in tables






Rules


측정결과를 나타내기 위해 명령어 이름과 “.”와 변수 이름을 조합하여 표시하였다. 예를 들어 “iostat -x” 명령을 사용하여 디스크 서비스 타임을 30초 간격으로 측정하였다면 이름을 “iostat-x30.svc_t”와 같이 표시하였다.


변수들간의 조합은 논리연산자 “&&”, “||”, “==”를 사용하였고 간결하게 하기 위해 범위는 “0 <= X < 100” 와 같이 표기하였다.


Levels


테이블의 level은 상태의 심각함 정도를 나타내며 아래표와 같다.


 


























Level


 Discription


white


 low usage


blue


 under-utilization/imbalance of resource


green


 target utilization/no problem


amber


 warning level


red


 critical level that needs to be fixed


black


 problems that can pervent your system


 


 


Actions


각 테이블의 rules에서 취해져야할 조치를 표기했다. 문제에 대한 간단한 메모와 관련사항들을 나타내고 있다.


 


 


Rules for SunOS4  and Solaris2






 





































CPU Rule


Level


Action


 0 == vmstat30.r


White


CPU Idle


 0 < (vmstat30.r / ncpus) < 3.0


Green


No problem


 3.0 <= (vmstat30.r / ncpus) <= 5.0


Amber


CPU Busy


 5.0 <= (vmstat30.r /ncpus)


Red


CPU Busy


 mpstat30.smtx < 200


Green


No problem


 200 <= mpstat30.smtx < 400


Amber


Mutex Stall


 400 <= mpstat30.smtx


Red


Mutex Stall


 


CPU Idle


The CPU power of this system is underutilized. Fewer or less powerful CPUs could be used to do this job.


CPU Busy


There is insufficient CPU power and jobs are spending an increasing amount of time in the queue before being assigned to a CPU. This reduces throughput and increases interactive response times.


Mutex Stall


If the number of stalls per CPU per second exceeds the limit there is mutex contention happening in the kernel which wastes CPU time and degrades multiprocessor scaling.


>> Tunable Kernel Parameters


 


커널 변수는 운영체제 릴리즈에 굉장히 의존적이다. 그래서 어느 한 릴리즈에서 잘 작동하는 것이 다른 릴리즈에서는 제대로 작동하지 않을 수도 있다.


아래에 보이는 표는 솔라리스 2.3 2.4 에서 권장하는 변수값이다.


 





































Name


Default


Min


Max


maxusers


MB available


(physmem)


8


2048


pt_cnt


48


48


3000


ncsize


~(maxusers*17) + 90


226


34906


ufs_ninode


~(maxusers*17) + 90


226


34906


autoup


30 sec


If (fsflush > 5% of CPU time)


=> double autoup


=> tune_t_fsflushr += 5


tune_t_fsflushr


5 sec


 


주의 : “autoup” 변수값을 120초 이상으로 설정하지 마라.


 


 


maxusers : kernel-tunable variable






 


대개의 경우 maxusssers 변수값은 시스템 메모리의 메가바이트 수로 설정될 것이다. /etc/system 파일을 사용하여 최대 2048까지 설정할 수 있지만 시스템은 maxusers 값을 결코 1024 보다 크게 설정하지는 않을 것이다.


 아래표는 SunOS 5.4 시스템에서 maxusers 변수값에 영향을 받는 커널변수들이다.


 







 max_npprocs = ( 10 + 16 * maxusers )


 ufs_ninode = ( max_nprocs + 16 + maxusers ) + 64


 ndquot = ( ( maxusers * NMOUNT ) / 4 ) + max_nprocs


 maxuprc = ( max_nprocs – 5 )


 ncsize = ( max_nprocs + 16 + maxusers ) + 64


 


>> Tuning Tips


 


▶ 한번에 한가지씩 점검하라.


▶ 영향을 가장 크게 미치는 요소에 최대한 시간을 투자하라.


▶ 다음 모두를 함께 고려하라.



  • Disk Access

  • CPU Access

  • Main Memory Access

  • Network I/O devices

▶ 튜닝은 한 쪽으로 치우친 것을 고르게 분배시키는 것이라는 것을 염두에 두라.


Disk bottleneck을 검토해서 busy 30% 이상이고 서비스 시간이 50ms 이상이면 데이터를 다른 곳으로 분산시키거나 DiskSuite 같은 툴로서 스트라이핑(striping)하라.


▶ 디스크가 문제없다는 말을 믿지 마라. “iostat -x 30” 명령을 사용하여 주의깊게 살펴봐라.


▶ 튜닝작업후 시스템의 성능을 향상시켰다면 디스크 busy를 다시 점검하라.


NFS 클라이언트는 I/O wait가 아닌 idle로서 서버를 기다린다. 속도가 낮은 NFS 클라이언트에서 “nfsstat -m” 명령을 사용하여 네트웍의 문제인지 NFS 서버의 문제인지를 점검하라.


vmstat 명령 실행시 free RAM의 값이 높음에 신경을 써지 마라. 전체의 불과 1/6이 머무른다.


vmstat 명령 실행시 page-in page-out의 값이 높음에 신경을 써지 마라. 모든 파일시스템의 I/O 작업은 page-in page-out을 통하여 행해진다.


▶ 실행중인 queue length 또는 load average CPU수의 4배가 넘으면 CPU가 더 필요할 것이다.


vmstat 명령 실행시 procs r 값 만큼의 procs b 값이 많으면 디스크가 느리지 않은지 점검하라


 


 

By haisins

오라클 DBA 박용석 입니다. haisins@gmail.com 으로 문의 주세요.

답글 남기기

이메일 주소를 발행하지 않을 것입니다. 필수 항목은 *(으)로 표시합니다