美创科技技术社区

注册

 

发新话题 回复该主题

性能优化案例(cache buffer chain latch) [复制链接]

1#
XXXX数据库优化调整建议
运行现象和故障描述
10月22日中午之后,XXXX数据库系统性能急剧恶化,出现大量的并发性冲突,表现为等待大量的latch free,latch: cache buffer chain以及latch: library cache 。操作系统CPU消耗达到100%,系统性能响应极为缓慢。
下面是运行期间的AWR报告:

Per Second

Per Transaction

Redo size:

114,459.47

6,101.89

Logical reads:

488,049.46

26,018.15

Block changes:

444.92

23.72

Physical reads:

2,637.96

140.63

Physical writes:

507.44

27.05

User calls:

616.09

32.84

Parses:

388.48

20.71

Hard parses:

14.67

0.78

Sorts:

154.18

8.22

Logons:

5.88

0.31

Executes:

651.57

34.74

Transactions:

18.76



% Blocks changed per Read:

0.09

Recursive Call %:

69.88

Rollback per transaction %:

2.05

Rows per Sort:

126.60

Instance Efficiency Percentages (Target 100%)

Buffer Nowait %:

100.00

Redo NoWait %:

100.00

Buffer Hit %:

99.55

In-memory Sort %:

100.00

Library Hit %:

98.80

Soft Parse %:

96.22

Execute to Parse %:

40.38

Latch Hit %:

88.19

Parse CPU to Parse Elapsd %:

7.39

% Non-Parse CPU:

97.93

Shared Pool Statistics


Begin

End

Memory Usage %:

85.74

88.10

% SQL with executions>1:

42.86

44.45

% Memory for SQL w/exec>1:

70.86

78.28

Top 5 Timed Events

Event

Waits

Time(s)

Avg Wait(ms)

% Total Call Time

Wait Class

latch: cache buffers chains

385,694

58,616

152

25.5

Concurrency

latch free

151,833

32,221

212

14.0

Other

CPU time


13,749


6.0


latch: library cache

45,698

3,226

71

1.4

Concurrency

direct path write temp

238,321

2,299

10

1.0

User I/O

从latch信息可以看到:
cache buffers chains   2,498,511,076 13.36 0.00 58671 21,510,689 0.40
cache buffers lru chain  378,882 0.55 0.08 9 22,568,763 1.07
library cache 34,822,373 0.59 0.22 3226 194,268 1.79
row cache objects 147,765,275 2.52 0.00 84 2,277 0.04
simulator lru latch 46,830,334 1.20 0.27 32216 2,168,754 0.16
这些相关的latch冲突比较高,尤其对于cache buffer chain和simulator lru latch冲突几乎频繁发生,并且获取时间很长,通过大量spin消耗CPU时间。
以下表格消耗大量CPU并且引发LATCH冲突:

Segments by Logical Reads


  • Total Logical      Reads: 1,758,696,980
  • Captured      Segments account for 94.8% of Total

Owner

Tablespace Name

Object Name

Subobject Name

Obj. Type

Logical Reads

%Total

NBAIC

GSNEW

PK_BM_GLWG


INDEX

996,334,272

56.65

NBAIC

PARTITION_TBS

HK_JD

HK_JD_P12

TABLE PARTITION

270,731,328

15.39

NBAIC

GSNEW

HK_SSJH_HK


TABLE

159,548,544

9.07

WLJJJG

GSNEW

WL_SSJG


TABLE

58,100,944

3.30

NBAIC

GSNEW

SYS_SPLC


TABLE

19,291,264

1.10

以下SQL语句消耗大量CPU并且引发LATCH FREE,消耗CPU资源。
另外从SQL信息可以看到引发大量cache buffer chain以及simulator lru latch冲突几乎都由相同的SQL语句引起,这些语句极度消耗CPU资源,从而导致系统响应缓慢。
Select * from (select row_.*, rownum rownum_ from (select a.xh, a.jhmc, a.kssj, a.jssj, a.fbr, a.fbsj, a.zrr, a.jhnr, a.gxdw, (select count(b.nbxh) from hk_ssjh_hk b, hk_jd m, bm_glwg n where a.xh=b.ssjhxh and b.wcbj=’0′ and b.nbxh=m.nbxh and m.glwg = n.code(+) and m.gxdw like ‘330281%’ ) jhhk, (select count(b.nbxh) from hk_ssjh_hk b, hk_jd m, bm_glwg n where a.xh=b.ssjhxh and b.wcbj=’1′ and b.nbxh=m.nbxh and m.glwg = n.code(+) and m.gxdw like ‘330281%’ ) ywchk, (select count(wzxh) from hk_ssjh_wz c, hk_wzjy m where a.xh=c.ssjhxh and c.wcbj=’0′ and c.wzxh = m.xh and m.gxdw like ‘330281%’ ) jhwz, (select count(wzxh) from hk_ssjh_wz c, hk_wzjy m where a.xh=c.ssjhxh and c.wcbj=’1′ and c.wzxh = m.xh and m.gxdw like ‘330281%’ ) ywcwz from hk_ssjh a where 1 = 1 and a.gxdw like ‘330281%’ and a.gxdw = ‘33028103′ order by a.fbsj desc )row_ where rownum<=20) where rownum_>0
类似的语句几乎消耗了系统80%以上的CPU资源,这些语句同样引起了大量buffer cache冲突。
该语句写法存在问题,存在着大量过多的cache访问,频繁进行latch锁定,从而导致系统故障。
故障诊断:
从以上现象和数据中,SQL语句是引起性能问题的最主要原因。高消耗语句的高并发导致系统资源极度紧张,从而引发一些Oracle在高负载下面的Oracle bug,使业务系统性能更加缓慢。
Oracle在10.2.0.3下面的相关bug:

Latch Contention

[url=]6455161[/url]+

Higher CPU / Higher “cache buffer chains”   latch gets / Higher “consistent gets” after truncate/Rebuild

[url=]4691237[/url]

High “library cache” latch gets from SQL   using objects in PLSQL

[url=]5749075[/url]

High Requests on dc_rollback_segments. latch / US enqueue   contention

[url=]5918642[/url]

Heavy latch contention with DB_CACHE_ADVICE on

[url=]6333663[/url]

Shared pool latch contention due to fragmentation of   large pool

[url=]6356566[/url]

Memory leak / high CPU selecting from V$SQL_PLAN   (affects statspack)

解决方案:
(一)、简单控制该类语句并发规模
该方法简单可靠,可以迅速实施。
(二)、系统优化方案和步骤
(1)、修改参数
关闭缓存建议:
Alter system set db_cache_advice=off scope=both
关闭内存自动调整:
Alter system set db_cache_size=3g scope=both
Alter system set shared_pool_size=2g
Alter system set sga_target=0 scope=both
调整dbwr参数
Alter system set db_writer_processes = 8 scope=both
修改一些内部参数:
Alter system set “_cursor_plan_unparse_enabled” = false scope=bothalter system set "_smu_debug_mode"=33554432 scope=both;以上参数都可以在线修改。
(2)、检查是否存在truncate或者rebuild的索引
select owner||'.'||Object_name||'.'||subobject_name    from dba_objects    where object_id<>data_object_id      and object_type like 'INDEX%'  order by owner,object_name,subobject_name;如果存在这些索引,删除索引并且重新创建这些索引。
(3)、修正SQL语句
Select * from (select row_.*, rownum rownum_ from (select a.xh, a.jhmc, a.kssj, a.jssj, a.fbr, a.fbsj, a.zrr, a.jhnr, a.gxdw, (select count(b.nbxh) from hk_ssjh_hk b, hk_jd m, bm_glwg n where a.xh=b.ssjhxh and b.wcbj=’0′ and b.nbxh=m.nbxh and m.glwg = n.code(+) and m.gxdw like ‘330281%’ ) jhhk, (select count(b.nbxh) from hk_ssjh_hk b, hk_jd m, bm_glwg n where a.xh=b.ssjhxh and b.wcbj=’1′ and b.nbxh=m.nbxh and m.glwg = n.code(+) and m.gxdw like ‘330281%’ ) ywchk, (select count(wzxh) from hk_ssjh_wz c, hk_wzjy m where a.xh=c.ssjhxh and c.wcbj=’0′ and c.wzxh = m.xh and m.gxdw like ‘330281%’ ) jhwz, (select count(wzxh) from hk_ssjh_wz c, hk_wzjy m where a.xh=c.ssjhxh and c.wcbj=’1′ and c.wzxh = m.xh and m.gxdw like ‘330281%’ ) ywcwz from hk_ssjh a where 1 = 1 and a.gxdw like ‘330281%’ and a.gxdw = ‘33028103′ order by a.fbsj desc )row_ where rownum<=20) where rownum_>0
主要该语句引起,需要选择另外一种写法。
(4)、调度时间应用Oracle patchset。
在合适的时间段给Oracle应用patchset。
总结:
虽然oracle 10.2.0.3中存在类似的bug不少,但是引发的根本在SQL语句效率低下引起。
分享 转发
TOP
发新话题 回复该主题