시스템
네트웍
구성
점검

 

DB Server Network

/etc/hosts

kblotdb1@oracle10:/home2/oracle10/work>cat /etc/hosts

# @(#)B.11.11_LRhosts $Revision: 1.9.214.1 $ $Date: 96/10/08 13:20:01 $

#

# The form for each entry is:

# <internet address> <official hostname> <aliases>

#

# For example:

# 192.1.2.34 hpfcrm loghost

#

# See the hosts(4) manual page for more information.

# Note: The entries cannot be preceded by a space.

# The format described in this file is the correct format.

# The original Berkeley manual page contains an error in

# the format description.

#

 

127.0.0.1 localhost loopback

10.55.50.201 kblotdb1

10.55.50.202 kblotdb2

10.55.49.206 kblotdb1_int

10.55.49.207 kblotdb2_int

10.55.50.208 kblotdb1_vip

10.55.50.209 kblotdb2_vip

  • kblotdb1/kblotdb2 public IP, kblotdb1_int/kblotdb2_int cluster interconnect, kblotdb1_vip/kblotdb2_vip oracle VIP

 

netstat

kblotdb1@oracle10:/home2/oracle10>netstat -in

Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll

lan3* 1500 none none 0 0 0 0 0

lan2 1500 10.55.49.0 10.55.49.206 11357335 0 11975506 0 0

lan1 1500 10.55.50.0 10.55.50.201 6916147 0 13666471 0 0

lan0 1500 192.1.1.0 192.1.1.1 1044011 0 1973741 0 0

lo0 4136 127.0.0.0 127.0.0.1 34678497 0 34678513 0 0

lan4* 1500 none none 0 0 0 0 0

lan1:1 1500 10.55.50.0 10.55.50.208 7737385 0 2486558 0 0


  • 서버
    모두 public IP lan1 10.55.50.x
    사용하고
    있고, cluster interconnect lan2 10.55.49.x
    사용하고
    있음
  • Oracle VIP public IP lan1
    대해
    구성되어
    있음 (lan1:1)
  • lan3 lan4 standby
    구성되어
    이중화가
    되어
    있음
  • lan0 HP MC/SG cluster HeartBit으로
    구성되어
    있음

     

modify nodeapps

lan1
장애가
발생하면 lan3 lan1 IP
받게
되므로, oracle VIP lan3
정보를
가지고
있어야
. 아래와
같은
작업이
필요. (root user
수행해야
)

/home2/oracle10/bin/srvctl modify nodeapps -n kblotdb1 -o /home2/oracle10 -A kblotdb1_vip/255.255.255.0/lan1\|lan3

/home2/oracle10/bin/srvctl modify nodeapps -n kblotdb2 -o /home2/oracle10 -A kblotdb2_vip/255.255.255.0/lan1\|lan3


  • 작업을
    위해서는 DB, nodeapps 모두
    내리고
    해야
    . (관련
    정보는 metalink Note 296874.1 참조)

 

 

 

Cluster Interconnect

앞의
구성에서


있지만, sysdba
접속한 SQL상에서
다음과
같은
명령에
생성된 trace file
보고
확인할

있음. trace file udump
생성됨.

SQL> oradebug setmypid

SQL> oradebug ipc

SSKGXPT 0x275efc flags SSKGXPT_READPENDING info for network 0

socket no 8 IP 10.55.49.206 UDP 54216

sflags SSKGXPT_UP

info for network 1

socket no 0 IP 0.0.0.0 UDP 0

sflags SSKGXPT_DOWN

context timestamp 0

no ports

sconno accono ertt state seq# sent async sync rtrans acks

ach accono sconno admno state seq# rcv rtrans acks

 

SSKGXPT 0x275fb4 flags SSKGXPT_READPENDING info for network 0

socket no 8 IP 10.55.49.207 UDP 51946

sflags SSKGXPT_UP

info for network 1

socket no 0 IP 0.0.0.0 UDP 0

sflags SSKGXPT_DOWN

context timestamp 0

no ports

sconno accono ertt state seq# sent async sync rtrans acks

ach accono sconno admno state seq# rcv rtrans acks

  • UDP
    해당하는 IP
    보면, 앞에서
    살펴본
    바와
    동일함을


    있음

 

 

 

REMOTE_LISTENER

테스트
이전

장애
테스트
, 서버의 listener 정보가
다음과
같았음

kblotdb1@oracle10:/home2/oracle10/admin/dslot/udump>lsnrctl ser LISTENER_KBLOTDB1

 

LSNRCTL for HPUX: Version 10.1.0.4.0 – Production on 21-SEP-2005 14:59:44

 

Copyright (c) 1991, 2004, Oracle. All rights reserved.

 

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=kblotdb1_vip)(PORT=1521)))

Services Summary…

Service “PLSExtProc” has 1 instance(s).

Instance “PLSExtProc”, status UNKNOWN, has 1 handler(s) for this service…

Handler(s):

“DEDICATED” established:0 refused:0

LOCAL SERVER

Service “dslot” has 2 instance(s).

Instance “dslot1”, status READY, has 2 handler(s) for this service…

Handler(s):

“DEDICATED” established:481 refused:0 state:ready

LOCAL SERVER

“DEDICATED” established:0 refused:0 state:ready

REMOTE SERVER

(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=kblotdb1_vip)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=dslot)(INSTANCE_NA

ME=dslot1)))

Instance “dslot2”, status READY, has 1 handler(s) for this service…

Handler(s):

“DEDICATED” established:131 refused:0 state:ready

REMOTE SERVER

(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=kblotdb2_vip)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=dslot)(INSTANCE_NAME=dslot2)))

The command completed successfully

 

REMOTE SERVER
등록된
배경은
다음 init.ora tnsnames.ora
내용을
살펴보면
. (현재 spfile
사용하고
있지
않음)

remote_listener=LISTENERS_DSLOT

dslot1.local_listener =’LOCAL_DSLOT1′

dslot2.local_listener =’LOCAL_DSLOT2′

 

LOCAL_DSLOT2 =

(DESCRIPTION =

(ADDRESS = (PROTOCOL = TCP)(HOST = kblotdb2_vip)(PORT = 1521))

(CONNECT_DATA =

(SERVER = DEDICATED)

(SERVICE_NAME = dslot)

(INSTANCE_NAME = dslot2)

)

)

 

LOCAL_DSLOT1 =

(DESCRIPTION =

(ADDRESS = (PROTOCOL = TCP)(HOST = kblotdb1_vip)(PORT = 1521))

(CONNECT_DATA =

(SERVER = DEDICATED)

(SERVICE_NAME = dslot)

(INSTANCE_NAME = dslot1)

)

)

 

LISTENERS_DSLOT =

(ADDRESS_LIST =

(ADDRESS = (PROTOCOL = TCP)(HOST = kblotdb1_vip)(PORT = 1521))

(ADDRESS = (PROTOCOL = TCP)(HOST = kblotdb2_vip)(PORT = 1521))

)

 

 

REMOTE_LISTENER

REMOTE_LISTENER
정의되어
있으면 server단에서 connection load balancing
이루어지기
때문에, client
의도하지
않은
상황으로 DB connection
맺어질

있음.

  • BEA WebLogic Connection pool
    사용하기
    때문에, 굳이 REMOTE_LISTENER
    사용할
    필요가
    없음.
  • 그리고 LOCAL_LISTENER
    해당하는 tnsnames.ora CONNECT_DATA
    불필요함

 

 

테스트
상황

따라서 init.ora tnsnames.ora
아래와
같이
구성해서 REMOTE_LISTENER
사용하지
않도록
.    

#remote_listener=LISTENERS_DSLOT

dslot1.local_listener =’LISTENER_DSLOT1′

dslot2.local_listener =’LISTENER_DSLOT2′

 

LISTENER_DSLOT1 =

(ADDRESS_LIST =

(ADDRESS = (PROTOCOL = TCP)(HOST = kblotdb1_vip)(PORT = 1521))

)

 

LISTENER_DSLOT2 =

(ADDRESS_LIST =

(ADDRESS = (PROTOCOL = TCP)(HOST = kblotdb2_vip)(PORT = 1521))

)

 


상황에서

서버의 listener 정보는
다음과
같음

kblotdb1@oracle10:/home2/oracle10/dbs>lsnrctl ser LISTENER_KBLOTDB1

 

LSNRCTL for HPUX: Version 10.1.0.4.0 – Production on 21-SEP-2005 15:42:03

 

Copyright (c) 1991, 2004, Oracle. All rights reserved.

 

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=kblotdb1_vip)(PORT=1521)))

Services Summary…

Service “PLSExtProc” has 1 instance(s).

Instance “PLSExtProc”, status UNKNOWN, has 1 handler(s) for this service…

Handler(s):

“DEDICATED” established:0 refused:0

LOCAL SERVER

Service “dslot” has 1 instance(s).

Instance “dslot1”, status READY, has 1 handler(s) for this service…

Handler(s):

“DEDICATED” established:22 refused:0 state:ready

LOCAL SERVER

The command completed successfully

 

kblotdb2|/home2/oracle10/dbs> lsnrctl ser LISTENER_KBLOTDB2

 

LSNRCTL for HPUX: Version 10.1.0.4.0 – Production on 21-SEP-2005 15:46:24

 

Copyright (c) 1991, 2004, Oracle. All rights reserved.

 

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=kblotdb2_vip)(PORT=1521)))

Services Summary…

Service “PLSExtProc” has 1 instance(s).

Instance “PLSExtProc”, status UNKNOWN, has 1 handler(s) for this service…

Handler(s):

“DEDICATED” established:0 refused:0

LOCAL SERVER

Service “dslot” has 1 instance(s).

Instance “dslot2”, status READY, has 1 handler(s) for this service…

Handler(s):

“DEDICATED” established:40 refused:0 state:ready

LOCAL SERVER

The command completed successfully

 

 

RAC10g Failover 테스트

 

ORACLE instance 강제
종료

RAC상의
임의의 instance
강제
종료시킨
경우, client WebLogic
서비스가 RAC
살아있는 instance failover
되어야
.

1호기의 instance
강제
종료시킨
직후, 2호기의 alert.log
다음과
같음

Wed Sep 21 16:31:31 2005

Reconfiguration started (old inc 5, new inc 6)

List of nodes:

1

Global Resource Directory frozen

* dead instance detected – domain 0 invalid = TRUE

Update rdomain variables

Communication channels reestablished

Master broadcasted resource hash value bitmaps

Non-local Process blocks cleaned out

Wed Sep 21 16:31:31 2005

LMS 1: 0 GCS shadows cancelled, 0 closed

Wed Sep 21 16:31:31 2005

LMS 0: 0 GCS shadows cancelled, 0 closed

Set master node info

Submitted all remote-enqueue requests

Dwn-cvts replayed, VALBLKs dubious

All grantable enqueues granted

Post SMON to start 1st pass IR

Wed Sep 21 16:31:32 2005

LMS 1: 2988 GCS shadows traversed, 0 replayed

Wed Sep 21 16:31:32 2005

LMS 0: 2871 GCS shadows traversed, 0 replayed

Wed Sep 21 16:31:32 2005

Submitted all GCS remote-cache requests

Post SMON to start 1st pass IR

Fix write in gcs resources

Wed Sep 21 16:31:32 2005

Instance recovery: looking for dead threads

Wed Sep 21 16:31:32 2005

Beginning instance recovery of 1 threads

Reconfiguration complete

Wed Sep 21 16:31:33 2005

Started redo scan

Wed Sep 21 16:31:33 2005

Completed redo scan

240 redo blocks read, 104 data blocks need recovery

Wed Sep 21 16:31:33 2005

Started redo application at

Thread 1: logseq 7, block 1392, scn 0.0

Wed Sep 21 16:31:33 2005

Recovery of Online Redo Log: Thread 1 Group 2 Seq 7 Reading mem 0

Mem# 0 errs 0: /dev/kblotdb_vgdb01/rredo112.dbf

Mem# 1 errs 0: /dev/kblotdb_vgdb02/rredo212.dbf

Wed Sep 21 16:31:33 2005

Completed redo application

Wed Sep 21 16:31:34 2005

Completed instance recovery at

Thread 1: logseq 7, block 1632, scn 0.3659612

84 data blocks read, 128 data blocks written, 240 redo blocks read

  • 장애 instance
    대한 instance recovery
    완전히
    이루어지는데 3
    정도
    소요됨을


    있음
  • 10g JDBC driver(THIN)
    쓰는 WebLogic5.1 connection pool
    살아있는 RAC instance failover되었음

     

     

 

DB 서버 shutdown

RAC상의
임의의 DB 서버에
장애를
발생시킨
경우, client WebLogic
서비스가 RAC
살아있는
서버(instance) failover
되어야
.

1호기의 DB 서버에
장애가
발생한
, 2호기의 log
다음과
같음

$ORA_CRS_HOME/css/log/ocssd2.log

2005-09-22 02:08:52.076 [4] >WARNING: clssnmeventhndlr: Receive failure with node 1, rc=11

2005-09-22 02:08:52.441 [3] >TRACE: clssnm_skgxncheck: CSS daemon failed on node 1

2005-09-22 02:08:55.330 [8] >WARNING: clssnmPollingThread: node(1) missed(4) checkin(s)

2005-09-22 02:08:56.340 [8] >WARNING: clssnmPollingThread: node(1) missed(5) checkin(s)

2005-09-22 02:08:57.350 [8] >WARNING: clssnmPollingThread: Eviction started for node 1, flags 0x0001, state 3, wt4c 0

2005-09-22 02:09:02.402 [8] >TRACE: clssnmDoSyncUpdate: Initiating sync 15

2005-09-22 02:09:02.402 [4] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] seq[1] sync[15]

2005-09-22 02:09:02.871 [1] >USER: NMEVENT_SUSPEND [00][00][00][04]

2005-09-22 02:09:06.441 [8] >TRACE: clssnmEvict: Evicting node 1, birth 10, death 0, killme 1

2005-09-22 02:09:06.443 [4] >USER: clssnmHandleUpdate: SYNC(15) from node(2) completed

2005-09-22 02:09:06.443 [4] >USER: clssnmHandleUpdate: NODE(2) IS ACTIVE MEMBER OF CLUSTER

2005-09-22 02:09:06.911 [13] >USER: NMEVENT_RECONFIG [00][00][00][04]

2005-09-22 02:09:06.911 [13] >TRACE: clssgmCleanupGrocks: cleaning up grock DBDSLOT type 2

2005-09-22 02:09:06.912 [13] >TRACE: clssgmCleanupGrocks: cleaning up grock DGDSLOT type 2

2005-09-22 02:09:06.912 [13] >TRACE: clssgmCleanupGrocks: cleaning up grock DAALL_DB type 2

2005-09-22 02:09:06.912 [13] >TRACE: clssgmCleanupGrocks: cleaning up grock ocr_crs type 2

2005-09-22 02:09:06.912 [13] >TRACE: clssgmCleanupGrocks: cleaning up grock IGDSLOTALL type 2

2005-09-22 02:09:06.912 [13] >TRACE: clssgmCleanupGrocks: cleaning up grock RES ora.dslot.dslot.dslot2.srv type 3

2005-09-22 02:09:06.912 [13] >TRACE: clssgmEstablishConnections: 1 nodes in cluster incarn 15

2005-09-22 02:09:06.912 [7] >TRACE: clssgmPeerListener: connects done (1/1)

CLSS-3000: reconfiguration successful, incarnation 15 with 1 nodes

 

CLSS-3001: local node number 2, master node number 2

 

2005-09-22 02:09:06.985 [13] >TRACE: clssnmpostev: leave event posted, node 1

  • 살아있는 2호기에서 1호기의
    장애를
    감지하고 1호기
    노드를 eviction했음을


    있음.

     

     

$ORA_CRS_HOME/crs/log/kblotdb2.log

2005-09-22 02:09:07.001: Processing MemberLeave

2005-09-22 02:09:07.001: [MEMBERLEAVE:717] Processing member leave for kblotdb1, incarnation: 15

2005-09-22 02:09:07.217: [RESOURCE:717] Not failing resource ora.dslot.dslot.dslot2.srv because it was locked.

2005-09-22 02:09:07.218: [RESOURCE:717] X_RES_Unavailable : Resource ora.dslot.dslot.dslot2.srv is locked

(File: rti.cpp, line: 812)

2005-09-22 02:09:07.351: Attempting to start ora.kblotdb1.vip on member kblotdb2

2005-09-22 02:09:35.059: Start of ora.kblotdb1.vip on member kblotdb2 succeeded.

2005-09-22 02:09:35.194: Attempting to start ora.dslot.dslot.cs on member kblotdb2

2005-09-22 02:09:35.755: Start of ora.dslot.dslot.cs on member kblotdb2 succeeded.

2005-09-22 02:09:35.865: Attempting to start ora.dslot.db on member kblotdb2

2005-09-22 02:09:36.319: Start of ora.dslot.db on member kblotdb2 succeeded.

2005-09-22 02:09:36.323: [MEMBERLEAVE:717] Do failover for: kblotdb1

2005-09-22 02:09:36.324: [MEMBERLEAVE:717] Post recovery done evmd event for: kblotdb1

  • 다음으로 CRS 1호기에
    있던 oracle VIP
    살아있는 2호기로 failover시켰음을


    있음.

     

     

netstat

kblotdb2|/home2/oracle10/work> netstat -in

Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll

lan3* 1500 none none 0 0 0 0 0

lan2 1500 10.55.49.0 10.55.49.207 12518382 0 12044400 0 0

lan1:1 1500 10.55.50.0 10.55.50.209 6161297 0 1696506 0 0

lan1 1500 10.55.50.0 10.55.50.202 12894636 0 22979733 0 0

lan0* 1500 192.1.1.0 192.1.1.2 2111713 0 1247138 0 0

lo0 4136 127.0.0.0 127.0.0.1 36147569 0 36147578 0 0

lan1:2 1500 10.55.50.0 10.55.50.208 1575 0 169 0 0

lan4* 1500 none none 0 0 0 0 0

  • 실제로 1호기의 oracle VIP 2호기의 lan1:2 failover됐음을


    있음.
  • WebLogic 서비스에
    문제
    없음

     

     

 

DB 서버
네트웍
장애

Public LAN 장애

정상인
경우 2호기의
네트웍
상황은
다음과
같음

kblotdb2|/home2/oracle10> netstat -in

Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll

lan3* 1500 none none 0 0 0 0 0

lan2 1500 10.55.49.0 10.55.49.207 12518881 0 12044830 0 0

lan1:1 1500 10.55.50.0 10.55.50.209 6485707 0 1702063 0 0

lan1 1500 10.55.50.0 10.55.50.202 12911405 0 23327953 0 0

lan0 1500 192.1.1.0 192.1.1.2 2112194 0 1247528 0 0

lo0 4136 127.0.0.0 127.0.0.1 36181974 0 36181983 0 0

lan1:2 1500 10.55.50.0 10.55.50.208 2914 0 272 0 0

lan4* 1500 none none 0 0 0 0 0

 

2호기의 public LAN lan1
네트웍을
절체한
경우
다음과
같이
변함

kblotdb2|/home2/oracle10> netstat -in

Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll

lan3 1500 10.55.50.0 10.55.50.202 1346 0 3331 0 0

lan2 1500 10.55.49.0 10.55.49.207 102765 0 102573 0 0

lan1* 1500 none none 13257 0 22091 0 0

lan0 1500 192.1.1.0 192.1.1.2 3766 0 6839 0 0

lo0 4136 127.0.0.0 127.0.0.1 140637 0 140637 0 0

lan3:1 1500 10.55.50.0 10.55.50.209 1 0 0 0 0

lan4* 1500 none none 2621 0 2677 0 0

  • Public IP standby였던 lan3으로
    이동됐으며, 이에
    따라 oracle VIP lan3:1
    올라와
    있음을


    있음
  • WebLogic 서비스에
    문제
    없음

     

2호기의 lan1
복구가
되면
아래처럼
원래대로
돌아오는
것을
확인할

있음

kblotdb2|/home2/oracle10> netstat -in

Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll

lan3* 1500 none none 3106 0 7692 0 0

lan2 1500 10.55.49.0 10.55.49.207 107982 0 107628 0 0

lan1:1 1500 10.55.50.0 10.55.50.209 12 0 0 0 0

lan1 1500 10.55.50.0 10.55.50.202 13752 0 22982 0 0

lan0 1500 192.1.1.0 192.1.1.2 4122 0 7513 0 0

lo0 4136 127.0.0.0 127.0.0.1 152489 0 152489 0 0

lan4* 1500 none none 2621 0 2677 0 0

 

 

cluster_interconnect LAN 장애

2호기의 cluster_interconnect LAN lan2
네트웍을
절체한
경우
다음과
같음

kblotdb2|/home2/oracle10> netstat -in

Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll

lan3 1500 10.55.49.0 10.55.49.207 5519 0 10620 0 0

lan2* 1500 none none 108070 0 107778 0 0

lan1:1 1500 10.55.50.0 10.55.50.209 487 0 13 0 0

lan1 1500 10.55.50.0 10.55.50.202 15595 0 24175 0 0

lan0 1500 192.1.1.0 192.1.1.2 4354 0 7953 0 0

lo0 4136 127.0.0.0 127.0.0.1 160463 0 160463 0 0

lan4* 1500 none none 2621 0 2677 0 0

  • Standby lan3으로 cluster_interconnect IP
    이동됐음을


    있음.
  • Oracle, WebLogic 모두
    서비스에
    문제
    없음

By haisins

오라클 DBA 박용석 입니다. haisins@gmail.com 으로 문의 주세요.

6 thoughts on “Oracle RAC 10g 의 Failover 테스트 방법”
  1. I used to be recommended this blog by way of my cousin. I’m not certain whether or not this post is written by way of him as no one else know such specific about my trouble.
    You’re amazing! Thank you!

  2. I have been surfing online more than 2 hours today, yet I never found any interesting article like yours.
    It is pretty worth enough for me. In my view, if all website
    owners and bloggers made good content as you did, the web will be a lot more useful
    than ever before.

답글 남기기

이메일 주소를 발행하지 않을 것입니다. 필수 항목은 *(으)로 표시합니다