美创科技技术社区

注册

 

发新话题 回复该主题

间隔一周左右,Oracle RAC hang(性能很差) [复制链接]

1#







现象:间隔一周左右,Oracle RAC hang(性能很差),持续时间大约为1小时。可以确认没有特殊业务发生,持续期间操作没有选择性,几乎任何操作都会发生hang情况。一小时之后绝大部分情况可以释放数据库HANG,简单重新启动数据库可以解决该问题。
正常时候的AWR:
Load Profile
 Per SecondPer Transaction
Redo size:

40,419.57

8,946.53

Logical reads:

33,688.12

7,456.59

Block changes:

247.72

54.83

Physical reads:

14.22

3.15

Physical writes:

8.94

1.98

User calls:

176.51

39.07

Parses:

27.13

6.01

Hard parses:

2.51

0.56

Sorts:

22.36

4.95

Logons:

0.38

0.08

Executes:

1,509.53

334.12

Transactions:

4.52

 
% Blocks changed per Read:

0.74

Recursive Call %:

90.88

Rollback per transaction %:

23.94

Rows per Sort:

443.92

Instance Efficiency Percentages (Target 100%)
Buffer Nowait %:

100.00

Redo NoWait %:

100.00

Buffer  Hit   %:

99.96

In-memory Sort %:

100.00

Library Hit   %:

99.52

Soft Parse %:

90.74

Execute to Parse %:

98.20

Latch Hit %:

100.00

Parse CPU to Parse Elapsd %:

90.09

% Non-Parse CPU:

97.23

Shared Pool Statistics
 BeginEnd
Memory Usage %:

13.44

16.39

% SQL with executions>1:

42.01

50.18

% Memory for SQL w/exec>1:

44.02

52.24

Top 5 Timed Events
EventWaitsTime(s)Avg Wait(ms)% Total Call TimeWait Class
CPU time

1,268

88.8

 
db file sequential read

8,776

51

6

3.6

User I/O
db file scattered read

5,604

38

7

2.7

User I/O
gc cr block 2-way

77,983

28

0

1.9

Cluster
log file sync

11,680

24

2

1.7

Commit

RAC Statistics
 BeginEnd
Number of Instances:

2

2

Global Cache Load Profile
 Per SecondPer Transaction
Global Cache blocks received:

49.63

10.99

Global Cache blocks served:

52.64

11.65

GCS/GES messages received:

162.57

35.98

GCS/GES messages sent:

162.11

35.88

DBWR Fusion writes:

1.96

0.43

Estd Interconnect traffic (KB)

881.61

Global Cache Efficiency Percentages (Target local+remote 100%)
Buffer access –  local cache %:

99.81

Buffer access – remote cache %:

0.15

Buffer access –         disk %:

0.04

Global Cache and Enqueue Services – Workload Characteristics
Avg global enqueue get time (ms):

0.2

Avg global cache cr block receive time (ms):

0.4

Avg global cache current block receive time (ms):

0.7

Avg global cache cr block build time (ms):

0.0

Avg global cache cr block send time (ms):

0.1

Global cache log flushes for cr blocks served %:

1.4

Avg global cache cr block flush time (ms):

8.0

Avg global cache current block pin time (ms):

0.0

Avg global cache current block send time (ms):

0.1

Global cache log flushes for current blocks served %:

0.0

Avg global cache current block flush time (ms):

162.4

Global Cache and Enqueue Services – Messaging Statistics
Avg message sent queue time (ms):

747.2

Avg message sent queue time on ksxp (ms):

0.3

Avg message received queue time (ms):

0.1

Avg GCS message process time (ms):

0.0

Avg GES message process time (ms):

0.0

% of direct sent messages:

52.77

% of indirect sent messages:

39.44

% of flow controlled messages:

7.79

Report SummaryCache Sizes
 BeginEnd  
Buffer Cache:

21,232M

21,232M

Std Block Size:

8K

Shared Pool Size:

9,312M

9,312M

Log Buffer:

14,288K

Load Profile
 Per SecondPer Transaction
Redo size:

59,687.69

16,681.01

Logical reads:

14,297.41

3,995.72

Block changes:

444.21

124.14

Physical reads:

1.59

0.44

Physical writes:

9.80

2.74

User calls:

132.42

37.01

Parses:

18.90

5.28

Hard parses:

0.69

0.19

Sorts:

14.09

3.94

Logons:

0.36

0.10

Executes:

786.43

219.78

Transactions:

3.58

 
% Blocks changed per Read:

3.11

Recursive Call %:

89.94

Rollback per transaction %:

31.10

Rows per Sort:

948.94

Instance Efficiency Percentages (Target 100%)
Buffer Nowait %:

100.00

Redo NoWait %:

100.00

Buffer  Hit   %:

99.99

In-memory Sort %:

100.00

Library Hit   %:

99.75

Soft Parse %:

96.33

Execute to Parse %:

97.60

Latch Hit %:

100.00

Parse CPU to Parse Elapsd %:

76.06

% Non-Parse CPU:

97.19

Shared Pool Statistics
 BeginEnd
Memory Usage %:

78.30

76.74

% SQL with executions>1:

99.87

98.12

% Memory for SQL w/exec>1:

99.55

97.41

Top 5 Timed Events
EventWaitsTime(s)Avg Wait(ms)% Total Call TimeWait Class
CPU time

525

81.1

 
gc cr block 2-way

63,010

32

1

4.9

Cluster
gc cr multi block request

42,624

25

1

3.8

Cluster
gc cr block busy

318

20

62

3.1

Cluster
gc current block 2-way

23,107

15

1

2.3

Cluster

RAC Statistics
 BeginEnd
Number of Instances:

2

2

Global Cache Load Profile
 Per SecondPer Transaction
Global Cache blocks received:

36.61

10.23

Global Cache blocks served:

29.45

8.23

GCS/GES messages received:

69.76

19.50

GCS/GES messages sent:

78.68

21.99

DBWR Fusion writes:

1.99

0.56

Estd Interconnect traffic (KB)

557.45

Global Cache Efficiency Percentages (Target local+remote 100%)
Buffer access –  local cache %:

99.74

Buffer access – remote cache %:

0.26

Buffer access –         disk %:

0.01

Global Cache and Enqueue Services – Workload Characteristics
Avg global enqueue get time (ms):

0.0

Avg global cache cr block receive time (ms):

0.9

Avg global cache current block receive time (ms):

0.8

Avg global cache cr block build time (ms):

0.0

Avg global cache cr block send time (ms):

0.0

Global cache log flushes for cr blocks served %:

0.4

Avg global cache cr block flush time (ms):

5.5

Avg global cache current block pin time (ms):

131.4

Avg global cache current block send time (ms):

0.0

Global cache log flushes for current blocks served %:

0.0

Avg global cache current block flush time (ms):

2.5

Global Cache and Enqueue Services – Messaging Statistics
Avg message sent queue time (ms):

167.8

Avg message sent queue time on ksxp (ms):

0.4

Avg message received queue time (ms):

0.0

Avg GCS message process time (ms):

0.0

Avg GES message process time (ms):

0.0

% of direct sent messages:

50.47

% of indirect sent messages:

41.53

% of flow controlled messages:

8.01



Main Report异常时候的AWR:
Cache Sizes
 BeginEnd  
Buffer Cache:

23,568M

23,232M

Std Block Size:

8K

Shared Pool Size:

7,008M

7,344M

Log Buffer:

14,288K

Load Profile
 Per SecondPer Transaction
Redo size:

23,141.40

4,750.81

Logical reads:

21,559.03

4,425.96

Block changes:

149.19

30.63

Physical reads:

2.12

0.44

Physical writes:

11.86

2.44

User calls:

188.89

38.78

Parses:

32.21

6.61

Hard parses:

2.79

0.57

Sorts:

19.00

3.90

Logons:

0.37

0.08

Executes:

796.93

163.61

Transactions:

4.87

 
% Blocks changed per Read:

0.69

Recursive Call %:

86.97

Rollback per transaction %:

21.31

Rows per Sort:

809.00

Instance Efficiency Percentages (Target 100%)
Buffer Nowait %:

100.00

Redo NoWait %:

100.00

Buffer  Hit   %:

99.99

In-memory Sort %:

100.00

Library Hit   %:

98.90

Soft Parse %:

91.35

Execute to Parse %:

95.96

Latch Hit %:

99.88

Parse CPU to Parse Elapsd %:

9.13

% Non-Parse CPU:

90.37

Shared Pool Statistics
 BeginEnd
Memory Usage %:

76.16

21.27

% SQL with executions>1:

95.79

40.47

% Memory for SQL w/exec>1:

94.61

37.40

Top 5 Timed Events
EventWaitsTime(s)Avg Wait(ms)% Total Call TimeWait Class
latch: library cache

1,405

3,525

2,509

27.6

Concurrency
latch: shared pool

4,278

3,437

803

26.9

Concurrency
CPU time

2,992

23.4

 
latch free

8,248

1,535

186

12.0

Other
enq: TX – row lock contention

2,631

1,285

488

10.1

Application

RAC Statistics
 BeginEnd
Number of Instances:

2

2

Global Cache Load Profile
 Per SecondPer Transaction
Global Cache blocks received:

48.91

10.04

Global Cache blocks served:

43.02

8.83

GCS/GES messages received:

116.15

23.85

GCS/GES messages sent:

117.51

24.12

DBWR Fusion writes:

3.33

0.68

Estd Interconnect traffic (KB)

781.03

Global Cache Efficiency Percentages (Target local+remote 100%)
Buffer access –  local cache %:

99.76

Buffer access – remote cache %:

0.23

Buffer access –         disk %:

0.01

Global Cache and Enqueue Services – Workload Characteristics
Avg global enqueue get time (ms):

11.1

Avg global cache cr block receive time (ms):

1.0

Avg global cache current block receive time (ms):

1.1

Avg global cache cr block build time (ms):

0.0

Avg global cache cr block send time (ms):

0.0

Global cache log flushes for cr blocks served %:

2.0

Avg global cache cr block flush time (ms):

20.2

Avg global cache current block pin time (ms):

139.8

Avg global cache current block send time (ms):

0.0

Global cache log flushes for current blocks served %:

0.0

Avg global cache current block flush time (ms):

161.0

Global Cache and Enqueue Services – Messaging Statistics
Avg message sent queue time (ms):

9.7

Avg message sent queue time on ksxp (ms):

0.7

Avg message received queue time (ms):

0.0

Avg GCS message process time (ms):

0.4

Avg GES message process time (ms):

0.0

% of direct sent messages:

47.45

% of indirect sent messages:

43.92

% of flow controlled messages:

8.63

Report SummaryCache Sizes
 BeginEnd  
Buffer Cache:

21,232M

21,232M

Std Block Size:

8K

Shared Pool Size:

9,312M

9,312M

Log Buffer:

14,288K

Load Profile
 Per SecondPer Transaction
Redo size:

150,033.42

15,504.00

Logical reads:

22,428.39

2,317.68

Block changes:

997.24

103.05

Physical reads:

1.88

0.19

Physical writes:

30.15

3.12

User calls:

168.17

17.38

Parses:

45.39

4.69

Hard parses:

4.03

0.42

Sorts:

26.14

2.70

Logons:

0.33

0.03

Executes:

696.47

71.97

Transactions:

9.68

 
% Blocks changed per Read:

4.45

Recursive Call %:

87.99

Rollback per transaction %:

12.22

Rows per Sort:

560.66

Instance Efficiency Percentages (Target 100%)
Buffer Nowait %:

100.00

Redo NoWait %:

100.00

Buffer  Hit   %:

99.99

In-memory Sort %:

100.00

Library Hit   %:

98.09

Soft Parse %:

91.12

Execute to Parse %:

93.48

Latch Hit %:

99.85

Parse CPU to Parse Elapsd %:

49.82

% Non-Parse CPU:

94.36

Shared Pool Statistics
 BeginEnd
Memory Usage %:

76.74

16.62

% SQL with executions>1:

98.12

40.28

% Memory for SQL w/exec>1:

97.41

39.07

Top 5 Timed Events
EventWaitsTime(s)Avg Wait(ms)% Total Call TimeWait Class
CPU time

1,037

28.7

 
gc current grant busy

29,912

511

17

14.2

Cluster
gc cr block 2-way

82,255

458

6

12.7

Cluster
gc buffer busy

1,322

438

332

12.1

Cluster
gc current block 2-way

40,663

260

6

7.2

Cluster

RAC Statistics
 BeginEnd
Number of Instances:

2

2

Global Cache Load Profile
 Per SecondPer Transaction
Global Cache blocks received:

43.15

4.46

Global Cache blocks served:

48.93

5.06

GCS/GES messages received:

117.68

12.16

GCS/GES messages sent:

116.36

12.02

DBWR Fusion writes:

2.71

0.28

Estd Interconnect traffic (KB)

782.31

Global Cache Efficiency Percentages (Target local+remote 100%)
Buffer access –  local cache %:

99.80

Buffer access – remote cache %:

0.19

Buffer access –         disk %:

0.01

Global Cache and Enqueue Services – Workload Characteristics
Avg global enqueue get time (ms):

1.5

Avg global cache cr block receive time (ms):

8.3

Avg global cache current block receive time (ms):

5.5

Avg global cache cr block build time (ms):

0.0

Avg global cache cr block send time (ms):

0.0

Global cache log flushes for cr blocks served %:

0.9

Avg global cache cr block flush time (ms):

7.9

Avg global cache current block pin time (ms):

0.0

Avg global cache current block send time (ms):

0.0

Global cache log flushes for current blocks served %:

0.0

Avg global cache current block flush time (ms):

1.4

Global Cache and Enqueue Services – Messaging Statistics
Avg message sent queue time (ms):

154.8

Avg message sent queue time on ksxp (ms):

3.0

Avg message received queue time (ms):

0.0

Avg GCS message process time (ms):

0.0

Avg GES message process time (ms):

0.0

% of direct sent messages:

49.50

% of indirect sent messages:

35.76

% of flow controlled messages:

14.73

从现象而言,应该是某部分系统资源经过时间的累积之后达到一定限制,从而导致性能缓慢,从Top Event以及shared pool,library cache以及hard parse信息来看,初步判断为Hard parse引起的shared pool耗尽,特别从shared pool的消耗来看,Oracle RAC Hang和shared pool耗尽存在相关性,并且几乎总是发生在SGA 动态增长达到限制的情况下。
在这种情况下 ,我们建议做以下修正:
(1)、修改SGA管理为手动方式
(2)、降低shared pool到2GB左右
(3)、修改cursor_sharing参数
验证:
修改cursor_sharing参数出现业务错误,遭遇cursor_sharing bug,放弃
修改到2GB,shared pool不足,增加到4GB
经过以上修整之后,shared pool相关latch冲突不再出现,但是出现cache buffer lru chain以及cache buffer chain冲突。
考虑到linux系统已经大内存管理问题,降低SGA到16GB,同时增加dbwr_write_processes参数为8,以增加cache buffer lru chain管理效率,并且在每天早上定期对于buffer cache和shared pool进行清除以对抗性能耗减的时间特性。
在部署以上修正之后,目前运行2周未再现Oracle RAC hang问题,继续等待。
下一步计划:
(1)、停止cache buffer刷新
(2)、等待oracle 10.2.0.5 patchset
分享 转发
TOP
发新话题 回复该主题