美创科技技术社区

注册

 

发新话题 回复该主题

Oracle RAC hang 10分钟左右 [复制链接]

1#
现象:
Oracle RAC hang10分钟左右,10分钟之后自动消失。
异常时候的AWR报告:
接点一:
Report SummaryCache Sizes
 BeginEnd  
Buffer Cache:

43,520M

43,520M

Std Block Size:

8K

Shared Pool Size:

10,944M

10,944M

Log Buffer:

63,476K

Load Profile
 Per SecondPer Transaction
Redo size:

1,014,619.37

848.78

Logical reads:

725,873.40

607.23

Block changes:

6,120.46

5.12

Physical reads:

16,977.11

14.20

Physical writes:

394.32

0.33

User calls:

24,595.68

20.58

Parses:

3,200.60

2.68

Hard parses:

51.61

0.04

Sorts:

1,409.76

1.18

Logons:

9.73

0.01

Executes:

8,150.30

6.82

Transactions:

1,195.38

 
% Blocks changed per Read:

0.84

Recursive Call %:

12.33

Rollback per transaction %:

90.68

Rows per Sort:

108.70

Instance Efficiency Percentages (Target 100%)
Buffer Nowait %:

99.91

Redo NoWait %:

100.00

Buffer  Hit   %:

97.66

In-memory Sort %:

100.00

Library Hit   %:

99.13

Soft Parse %:

98.39

Execute to Parse %:

60.73

Latch Hit %:

96.02

Parse CPU to Parse Elapsd %:

3.02

% Non-Parse CPU:

99.27

Shared Pool Statistics
 BeginEnd
Memory Usage %:

87.50

87.29

% SQL with executions>1:

64.72

69.75

% Memory for SQL w/exec>1:

68.31

72.26

Top 5 Timed Events
EventWaitsTime(s)Avg Wait(ms)% Total Call TimeWait Class
gc cr grant 2-way

698,801

22,062

32

15.3

Cluster
gc cr grant congested

17,802

21,274

1,195

14.8

Cluster
CPU time

19,078

13.2

 
gc buffer busy

723,841

16,935

23

11.7

Cluster
enq: TX – index contention

23,018

10,623

461

7.4

Concurrency

RAC Statistics
 BeginEnd
Number of Instances:

2

2

Global Cache Load Profile
 Per SecondPer Transaction
Global Cache blocks received:

5,217.88

4.37

Global Cache blocks served:

974.04

0.81

GCS/GES messages received:

8,616.00

7.21

GCS/GES messages sent:

17,046.23

14.26

DBWR Fusion writes:

2.24

0.00

Estd Interconnect traffic (KB)

54,547.51

Global Cache Efficiency Percentages (Target local+remote 100%)
Buffer access –  local cache %:

96.94

Buffer access – remote cache %:

0.72

Buffer access –         disk %:

2.34

Global Cache and Enqueue Services – Workload Characteristics
Avg global enqueue get time (ms):

9.9

Avg global cache cr block receive time (ms):

52.0

Avg global cache current block receive time (ms):

2.2

Avg global cache cr block build time (ms):

0.0

Avg global cache cr block send time (ms):

0.0

Global cache log flushes for cr blocks served %:

2.7

Avg global cache cr block flush time (ms):

2.8

Avg global cache current block pin time (ms):

0.0

Avg global cache current block send time (ms):

0.0

Global cache log flushes for current blocks served %:

0.0

Avg global cache current block flush time (ms):

6.8

Global Cache and Enqueue Services – Messaging Statistics
Avg message sent queue time (ms):

31.2

Avg message sent queue time on ksxp (ms):

0.4

Avg message received queue time (ms):

0.1

Avg GCS message process time (ms):

0.0

Avg GES message process time (ms):

0.0

% of direct sent messages:

16.14

% of indirect sent messages:

81.89

% of flow controlled messages:

1.97

节点二:
Report SummaryCache Sizes
 BeginEnd  
Buffer Cache:

43,712M

43,712M

Std Block Size:

8K

Shared Pool Size:

10,752M

10,752M

Log Buffer:

63,476K

Load Profile
 Per SecondPer Transaction
Redo size:

55,779.70

8,409.66

Logical reads:

429,824.12

64,802.66

Block changes:

314.85

47.47

Physical reads:

6,214.68

936.96

Physical writes:

18.41

2.78

User calls:

1,488.04

224.34

Parses:

195.12

29.42

Hard parses:

2.61

0.39

Sorts:

136.97

20.65

Logons:

2.05

0.31

Executes:

792.92

119.54

Transactions:

6.63

 
% Blocks changed per Read:

0.07

Recursive Call %:

20.53

Rollback per transaction %:

30.02

Rows per Sort:

96.76

Instance Efficiency Percentages (Target 100%)
Buffer Nowait %:

99.96

Redo NoWait %:

100.00

Buffer  Hit   %:

98.70

In-memory Sort %:

100.00

Library Hit   %:

99.18

Soft Parse %:

98.66

Execute to Parse %:

75.39

Latch Hit %:

99.73

Parse CPU to Parse Elapsd %:

1.38

% Non-Parse CPU:

99.94

Shared Pool Statistics
 BeginEnd
Memory Usage %:

70.64

70.93

% SQL with executions>1:

95.49

96.21

% Memory for SQL w/exec>1:

94.17

94.99

Top 5 Timed Events
EventWaitsTime(s)Avg Wait(ms)% Total Call TimeWait Class
row cache lock

18,242

11,480

629

21.6

Concurrency
gc current grant 2-way

10,566

7,511

711

14.1

Cluster
CPU time

6,463

12.2

 
gc current block 2-way

104,244

5,149

49

9.7

Cluster
gc cr grant 2-way

119,013

4,575

38

8.6

Cluster

RAC Statistics
 BeginEnd
Number of Instances:

2

2

Global Cache Load Profile
 Per SecondPer Transaction
Global Cache blocks received:

977.57

147.38

Global Cache blocks served:

5,264.35

793.68

GCS/GES messages received:

17,178.21

2,589.88

GCS/GES messages sent:

8,709.88

1,313.15

DBWR Fusion writes:

1.99

0.30

Estd Interconnect traffic (KB)

54,991.62

Global Cache Efficiency Percentages (Target local+remote 100%)
Buffer access –  local cache %:

98.48

Buffer access – remote cache %:

0.23

Buffer access –         disk %:

1.30

Global Cache and Enqueue Services – Workload Characteristics
Avg global enqueue get time (ms):

29.7

Avg global cache cr block receive time (ms):

36.8

Avg global cache current block receive time (ms):

6.9

Avg global cache cr block build time (ms):

0.0

Avg global cache cr block send time (ms):

0.0

Global cache log flushes for cr blocks served %:

42.5

Avg global cache cr block flush time (ms):

1.5

Avg global cache current block pin time (ms):

0.0

Avg global cache current block send time (ms):

0.0

Global cache log flushes for current blocks served %:

0.0

Avg global cache current block flush time (ms):

1.8

Global Cache and Enqueue Services – Messaging Statistics
Avg message sent queue time (ms):

39.6

Avg message sent queue time on ksxp (ms):

0.8

Avg message received queue time (ms):

0.0

Avg GCS message process time (ms):

0.0

Avg GES message process time (ms):

0.0

% of direct sent messages:

29.11

% of indirect sent messages:

69.98

% of flow controlled messages:

0.92

pmon信息:
*** 2009-10-10 16:32:40.360
[claim lock for dead process][lp 0x700000d5a82fb50][p 0x700000d6234b120.1229182][hist x9a514951]
PMON unable to acquire latch  700000d6f90b728 Child object queue header operation level=5 child#=74
Location from where latch is held: kclobj_2:
Context saved from call: 0
state=busy(shared) (val=0×4000000000000003)
waiters [orapid (seconds since: put on list, posted, alive check)]:
15 (0, 1255163618, 0)
22 (0, 1255163618, 0)
21 (0, 1255163618, 0)
20 (0, 1255163618, 0)
14 (0, 1255163618, 0)
13 (0, 1255163618, 0)
18 (0, 1255163618, 0)
10 (0, 1255163618, 0)
28 (0, 1255163618, 0)
31 (0, 1255163618, 0)
16 (0, 1255163618, 0)
11 (0, 1255163618, 0)
12 (0, 1255163618, 0)
17 (0, 1255163618, 0)
waiter count=14
gotten 1421109502 times wait, failed first 1130240 sleeps 540139
gotten 1591 times nowait, failed: 0
possible holder pid = 7 ospid=270800
—————————————-
SO: 700000d64f13208, type: 2, owner: 0, flag: INIT/-/-/0×00
(process) Oracle pid=7, calls cur/top: 700000d697fc530/700000d697f66d8, flag: (6) SYSTEM
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 90
last post received-location: KJCS Post snd proxy to flush msg
last process to post me: 700000d58f06f30 1 6
last post sent: 0 0 131
last post sent-location: kclcget
last process posted by me: 700000d58feec50 246 0
(latch info) wait_event=0 bits=20
holding    (efd=4) 700000d6f90b728 Child object queue header operation level=5 child#=74
Location from where latch is held: kclobj_2:
Context saved from call: 0
state=busy(shared) (val=0×4000000000000003)
waiters [orapid (seconds since: put on list, posted, alive check)]:
15 (0, 1255163618, 0)
22 (0, 1255163618, 0)
21 (0, 1255163618, 0)
20 (0, 1255163618, 0)
14 (0, 1255163618, 0)
13 (0, 1255163618, 0)
18 (0, 1255163618, 0)
10 (0, 1255163618, 0)
28 (0, 1255163618, 0)
31 (0, 1255163618, 0)
16 (0, 1255163618, 0)
11 (0, 1255163618, 0)
12 (0, 1255163618, 0)
17 (0, 1255163618, 0)
waiter count=14
Process Group: DEFAULT, pseudo proc: 700000d6915a010
O/S info: user: oracle, term: UNKNOWN, ospid: 270800
OSD pid info: Unix process pid: 270800, image: oracle@crmdb01 (LMS0)
Short stack dump:
ksdxfstk+002c<-ksdxcb+04e4<-sspuser+0074<-000044C0<-sskgxp_select+0100<-skgxpiwait+0614<-skgxpwait+02c8<-ksxpwait+0538<-ksliwat+0654<-kslwaitns_timed+0024<-kskthbwt+022c<-kslwait+00f4<-kclobj+076c<-kclaoliter+00b8<-kjblpkeydrmqscchk+0340<-kjmdrmchk+01d8<-kjmsm+062c<-ksbrdp+04b4<-opirip+03fc<-opidrv+0458<-sou2o+0090<-opimai_real+0150<-main+0098<-__start+0098
Dump of memory from 0×0700000D8CE57840 to 0×0700000D8CE57A48
700000D8CE57840 00000008 00000000 0700000D 61EEBD80  [............a...]
700000D8CE57850 00000010 000313A7 0700000D 697F66D8  [............i.f.]
700000D8CE57860 00000003 000313A7 0700000D 61EED1A0  [............a...]
700000D8CE57870 00000013 0003129B 0700000D 61C1FDF8  [............a...]
700000D8CE57880 0000000B 000313A7 0700000D 5D7ABC00  [............]z..]
700000D8CE57890 00000004 0003129B 0700000C E2555EF8  [.............U^.]
700000D8CE578A0 00000014 000313A7 0700000C DA555EB0  [.............U^.]
700000D8CE578B0 00000015 000313A7 0700000B 99623A10  [.............b:.]
700000D8CE578C0 00000007 000313A7 00000000 00000000  [................]
700000D8CE578D0 00000000 00000000 00000000 00000000  [................]
Repeat 22 times
700000D8CE57A40 00000000 00000000                    [........]
*** 2009-10-10 16:33:38.612
PMON unable to acquire latch  700000d6f90b728 Child object queue header operation level=5 child#=74
Location from where latch is held: kclobj_2:
Context saved from call: 0
state=busy(shared) (val=0×4000000000000001)
waiters [orapid (seconds since: put on list, posted, alive check)]:
20 (0, 1255163678, 0)
19 (0, 1255163678, 0)
11 (0, 1255163678, 0)
13 (0, 1255163678, 0)
9 (0, 1255163678, 0)
15 (0, 1255163678, 0)
17 (0, 1255163678, 0)
7 (0, 1255163678, 0)
14 (0, 1255163678, 0)
10 (0, 1255163678, 0)
22 (0, 1255163678, 0)
30 (0, 1255163678, 0)
28 (0, 1255163678, 0)
31 (0, 1255163678, 0)
12 (0, 1255163678, 0)
18 (0, 1255163678, 0)
16 (0, 1255163678, 0)
8 (0, 1255163678, 0)
waiter count=18
gotten 1421117482 times wait, failed first 1136125 sleeps 547725
gotten 1591 times nowait, failed: 0
possible holder pid = 21 ospid=217516
—————————————-
SO: 700000d64f141c8, type: 2, owner: 0, flag: INIT/-/-/0×00
(process) Oracle pid=21, calls cur/top: 700000cb495bdc0/700000d697f8db8, flag: (6) SYSTEM
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 90
last post received-location: KJCS Post snd proxy to flush msg
last process to post me: 700000d58f07710 1 6
last post sent: 0 0 136
last post sent-location: kclrcvt
last process posted by me: 700000d58f37ad0 217 0
(latch info) wait_event=0 bits=20
holding    (efd=4) 700000d6f90b728 Child object queue header operation level=5 child#=74
Location from where latch is held: kclobj_2:
Context saved from call: 0
state=busy(shared) (val=0×4000000000000001)
waiters [orapid (seconds since: put on list, posted, alive check)]:
20 (0, 1255163678, 0)
19 (0, 1255163678, 0)
11 (0, 1255163678, 0)
13 (0, 1255163678, 0)
9 (0, 1255163678, 0)
15 (0, 1255163678, 0)
17 (0, 1255163678, 0)
7 (0, 1255163678, 0)
14 (0, 1255163678, 0)
10 (0, 1255163678, 0)
22 (0, 1255163678, 0)
30 (0, 1255163678, 0)
28 (0, 1255163678, 0)
31 (0, 1255163678, 0)
12 (0, 1255163678, 0)
18 (0, 1255163678, 0)
16 (0, 1255163678, 0)
8 (0, 1255163678, 0)
waiter count=18
Process Group: DEFAULT, pseudo proc: 700000d6915a010
O/S info: user: oracle, term: UNKNOWN, ospid: 217516
OSD pid info: Unix process pid: 217516, image: oracle@crmdb01 (LMSe)
Short stack dump:
ksdxfstk+002c<-ksdxcb+04e4<-sspuser+0074<-000044C0<-sskgxp_select+0100<-skgxpiwait+0614<-skgxpwait+02c8<-ksxpwait+0538<-ksliwat+0654<-kslwaitns_timed+0024<-kskthbwt+022c<-kslwait+00f4<-kclobj+06fc<-kclaoliter+00b8<-kjblpkeydrmqscchk+0340<-kjmdrmchk+01d8<-kjmsm+062c<-ksbrdp+04b4<-opirip+03fc<-opidrv+0458<-sou2o+0090<-opimai_real+0150<-main+0098<-__start+0098
Dump of memory from 0×0700000D8CE57C50 to 0×0700000D8CE57E58
700000D8CE57C50 00000007 00000000 0700000C B99AE630  [...............0]
700000D8CE57C60 00000010 000313A7 0700000D 697F8DB8  [............i...]
700000D8CE57C70 00000003 000313A7 0700000D 61EF1730  [............a..0]
700000D8CE57C80 00000013 0003129B 0700000D 61C20D08  [............a...]
700000D8CE57C90 0000000B 000313A7 0700000D 5D7A9130  [............]z.0]
700000D8CE57CA0 00000004 0003129B 0700000C B99BB5B8  [................]
700000D8CE57CB0 00000014 000313A7 0700000C DA5562E8  [.............Ub.]
700000D8CE57CC0 00000015 000313A7 00000000 00000000  [................]
700000D8CE57CD0 00000000 00000000 00000000 00000000  [................]
Repeat 23 times
700000D8CE57E50 00000000 00000000                    [........]
*** 2009-10-10 16:34:38.670
PMON unable to acquire latch  700000d6f90b728 Child object queue header operation level=5 child#=74
Location from where latch is held: kclobj_2:
Context saved from call: 0
state=busy(shared) (val=0×4000000000000001)
waiters [orapid (seconds since: put on list, posted, alive check)]:
20 (1, 1255163738, 1)
21 (1, 1255163738, 1)
17 (1, 1255163738, 1)
11 (1, 1255163738, 1)
25 (1, 1255163738, 1)
31 (1, 1255163738, 1)
13 (0, 1255163738, 0)
22 (0, 1255163738, 0)
18 (0, 1255163738, 0)
16 (0, 1255163738, 0)
19 (0, 1255163738, 0)
15 (0, 1255163738, 0)
7 (0, 1255163738, 0)
12 (0, 1255163738, 0)
9 (0, 1255163738, 0)
8 (0, 1255163738, 0)
waiter count=16
gotten 1421125079 times wait, failed first 1141921 sleeps 554985
gotten 1591 times nowait, failed: 0
possible holder pid = 10 ospid=242648
—————————————-
SO: 700000d5cee7e10, type: 2, owner: 0, flag: INIT/-/-/0×00
(process) Oracle pid=10, calls cur/top: 700000d697fcd70/700000d697f6f30, flag: (6) SYSTEM
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 90
last post received-location: KJCS Post snd proxy to flush msg
last process to post me: 700000d64fafa48 184 0
last post sent: 0 0 131
last post sent-location: kclcget
last process posted by me: 700000d64fdc6e8 196 0
(latch info) wait_event=0 bits=20
holding    (efd=4) 700000d6f90b728 Child object queue header operation level=5 child#=74
Location from where latch is held: kclobj_2:
Context saved from call: 0
state=busy(shared) (val=0×4000000000000001)
waiters [orapid (seconds since: put on list, posted, alive check)]:
20 (1, 1255163738, 1)
21 (1, 1255163738, 1)
17 (1, 1255163738, 1)
11 (1, 1255163738, 1)
25 (1, 1255163738, 1)
31 (1, 1255163738, 1)
13 (0, 1255163738, 0)
22 (0, 1255163738, 0)
18 (0, 1255163738, 0)
16 (0, 1255163738, 0)
19 (0, 1255163738, 0)
15 (0, 1255163738, 0)
7 (0, 1255163738, 0)
12 (0, 1255163738, 0)
9 (0, 1255163738, 0)
8 (0, 1255163738, 0)
waiter count=16
Process Group: DEFAULT, pseudo proc: 700000d6915a010
O/S info: user: oracle, term: UNKNOWN, ospid: 242648
OSD pid info: Unix process pid: 242648, image: oracle@crmdb01 (LMS3)
Short stack dump:
ksdxfstk+002c<-ksdxcb+04e4<-sspuser+0074<-000044C0<-sskgxp_select+0100<-skgxpiwait+0614<-skgxpwait+02c8<-ksxpwait+0538<-ksliwat+0654<-kslwaitns_timed+0024<-kskthbwt+022c<-kslwait+00f4<-kclobj+06fc<-kclaoliter+00b8<-kjblpkeydrmqscchk+0340<-kjmdrmchk+01d8<-kjmsm+062c<-ksbrdp+04b4<-opirip+03fc<-opidrv+0458<-sou2o+0090<-opimai_real+0150<-main+0098<-__start+0098
Dump of memory from 0×0700000D58E747E0 to 0×0700000D58E749E8
700000D58E747E0 00000007 00000000 0700000D 61EEBF18  [............a...]
700000D58E747F0 00000010 000313A7 0700000D 697F6F30  [............i.o0]
700000D58E74800 00000003 000313A7 0700000D 61EEE088  [............a...]
700000D58E74810 00000013 0003129B 0700000D 61C20140  [............a..@]
700000D58E74820 0000000B 000313A7 0700000D 657E5608  [............e~V.]
700000D58E74830 00000004 0003129B 0700000C E25560A8  [.............U`.]
700000D58E74840 00000014 000313A7 0700000C DA555F88  [.............U_.]
700000D58E74850 00000015 000313A7 00000000 00000000  [................]
700000D58E74860 00000000 00000000 00000000 00000000  [................]
Repeat 23 times
700000D58E749E0 00000000 00000000                    [........]
*** 2009-10-10 16:35:38.603
[claim lock for dead process][lp 0x700000d6a156dc0][p 0x700000d62305d18.3063828][hist x9a514951]
[claim lock for dead process][lp 0x700000d63165988][p 0x700000d6217fc58.2757096][hist x49514951]
[claim lock for dead process][lp 0x700000d6eb56078][p 0x700000d61fe3fc8.422018][hist x49514951]
[claim lock for dead process][lp 0x700000d6e40aee0][p 0x700000d61fee3c0.1311026][hist x49514951]
[claim lock for dead process][lp 0x700000d5e3a2a40][p 0x700000d62203c10.3150576][hist x9a514951]
client错误信息:
*** 2009-10-10 16:35:40.488
*** ACTION NAME) 2009-10-10 16:35:40.476
*** MODULE NAMEchick@jiekou01 (TNS V1-V3)) 2009-10-10 16:35:40.476
*** SERVICE NAMEcrmdb) 2009-10-10 16:35:40.476
*** SESSION ID5671.59420) 2009-10-10 16:35:40.476
opitsk: network error occurred while two-task server trying tosend break; error code = 12152
lms进程信息:
*** 2009-10-10 16:35:40.161
KJM_HISTORY: RCVR STALL OP(10) context 30 elapsed 175594799 us
KJM HIST LMS0:
10:30:175594799 17:6 16:167 15:3 14:34:33 1:1 12:1:16 7:55 6:1 10:30:65489
17:4 16:41 15:1 14:34:35 1:1 12:1:10 7:33 6:1 10:30:21007 17:3
16:16 15:2 13:65521:154 20:36:37 20:36:45 20:36:68 1:1 12:1:12 7:47 6:2
10:30:10559 17:3 16:31 15:2 12:74 7:3 6:1 10:30:12277 2:1 19:1
18:6 17:3 16:2 15:2 12:2145 7:1 6:0 10:0 17:2 16:0
13:65518:52 20:34:28 20:34:22 1:0 12:1:5 7:0 6:0 10:1 17:1 16:1
12:41 7:12 6:1 10:1
—————————————-
SO: 700000d5d7abc00, type: 4, owner: 700000d64f13208, flag: INIT/-/-/0×00
(session) sid: 8800 trans: 0, creator: 700000d64f13208, flag: (51) USR/- BSY/-/-/-/-/-
DID: 0001-0007-00000003, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS
last wait for ‘gc object scan’ blocking sess=0×0 seq=50027 wait_time=12 seconds since wait started=0
=0, =0, =0
Dumping Session Wait History
for ‘gc object scan’ count=1 wait_time=12
=0, =0, =0
for ‘latch: object queue header operation’ count=1 wait_time=716
address=700000d0f164be0, number=83, tries=0
for ‘gc object scan’ count=1 wait_time=18
=0, =0, =0
for ‘latch: object queue header operation’ count=1 wait_time=267
address=700000d0f164be0, number=83, tries=0
for ‘latch: object queue header operation’ count=1 wait_time=1323
address=700000d0f164be0, number=83, tries=0
for ‘latch: object queue header operation’ count=1 wait_time=1203
address=700000d0f164be0, number=83, tries=0
for ‘latch: object queue header operation’ count=1 wait_time=10838
address=700000d0f164be0, number=83, tries=0
for ‘latch: object queue header operation’ count=1 wait_time=2706
address=700000d5f3c22f0, number=83, tries=0
for ‘latch: object queue header operation’ count=1 wait_time=1480
address=700000d5f3c22f0, number=83, tries=0
for ‘latch: object queue header operation’ count=1 wait_time=1536
address=700000d5f3c22f0, number=83, tries=0
temporary object counter: 0
—————————————-
UOL used : 0 locks(used=0, free=0)
KGX Atomic Operation Log 700000d77ffb0a0
Mutex 0(0, 0) idn 0 oper NONE
Library Cache uid 8800 efd 0 whr 0 slp 0
KGX Atomic Operation Log 700000d77ffb0e8
Mutex 0(0, 0) idn 0 oper NONE
Library Cache uid 8800 efd 0 whr 0 slp 0
KGX Atomic Operation Log 700000d77ffb130
Mutex 0(0, 0) idn 0 oper NONE
Library Cache uid 8800 efd 0 whr 0 slp 0
从现象综合分析表现为lms忙,无能力处理lms信息。
pmon进程等待lms进程,lms进程持续抓获:object queue header operator latch操作比较慢。
而gc cr/block grant busy
gc cr/block congous等待事件表明lms处理延迟。
检查OS watcher信息,系统资源基本正常,网络通信正常。
从row cache lock信息显示,初始认为是高峰期的DDL操作引起全局处理缓慢,检查期间DDL操作,没有发现相关DDL信息,基本排除是DDL操作引起。row cache lock可能是其他原因引起的结果。
从row cache lock移开之后再次把焦点锁定在lms慢,获得object queue header operator latch慢上来,这个latch的作用不明。但是从字面的意思来看,应该是发生在lms状态变更之上,这样就可以基本确定为Oracle bug。
搜索metalink,查找对应bug:
DRM FUNCTION ON 10.2.0.4 ” which marked as duplicated from  internal Bug  6919764
Bug 6960699 – AROLTP-C: INSTANCE CRASHED AFTER LMS1 ENCOUNTERED ORA-600  [KJBLDRMRPSTMASTER ”
>>> for AIX fixed in  patch 8625153
- This issue  also seems to be related to “Bug 7312086 – THE TIME-OUT WAS GENERATED BY THE  DRM FUNCTION ON 10.2.0.4 ” which marked as duplicated from Bug 6960699 – AROLTP-C: INSTANCE CRASHED AFTER LMS1 ENCOUNTERED ORA-600  [KJBLDRMRPSTMASTER ”
>>> for AIX fixed in  patch 8625153
- This issue  also seems to be related to “Bug 7312086 – THE TIME-OUT WAS GENERATED BY THE  DRM FUNCTION ON 10.2.0.4 ” which marked as duplicated from  internal Bug  6919764
bug6819764表现出比较相似形,lms waiting for “object queue header operator” latch。
后续处理:
由于其问题的偶发性,继续等待下次发现。如果频繁出现,应用10.2.0.4.1 psu,如果持续不再发生,等待10.2.0.5 patch发布。
分享 转发
TOP
发新话题 回复该主题