PostgreSQL的缩小数据集优化过程一例


小黄同学拿过来一句SQL问我,看看能不能优化,看了下,数据量不是很大,查询还不算很复杂。以下是记录优化过程:

DB:PostgreSQL 9.1
OS:CentOS 6
count(d_personal_report_view) ~ 9K条,涉及金额数量的字段类型为numeric(9,2)类型

原始SQL:

select distinct c.doctor_name,c.department_name,c.hospital_id,c.hospital_name,c.part_hospital_name,t.* from d_personal_report_view c,(
       select dc.part_hospital_id,dc.department_id,dc.doctor_id,
     coalesce(sum(coalesce(dc.surgery_amount, '0.0')), '0.0') as surgery_amount,sum(coalesce(dc.surgery_quantity, 0)) as surgery_quantity,
     coalesce(sum(coalesce(dc.repair_amount, '0.0')), '0.0') as repair_amount,sum(coalesce(dc.repair_quantity, 0)) as repair_quantity,
       coalesce(sum(coalesce(dc.orthod_amount, '0.0')), '0.0') as orthod_amount,sum(coalesce(dc.orthod_quantity, 0)) as orthod_quantity,
       coalesce(sum(coalesce(dc.radiation_amount, '0.0')), '0.0') as radiation_amount,sum(coalesce(dc.radiation_quantity, 0)) as radiation_quantity,
       coalesce(sum(coalesce(dc.teethcln_amount, '0.0')), '0.0') as teethcln_amount,sum(coalesce(dc.teethcln_quantity, 0)) as teethcln_quantity,
       coalesce(sum(coalesce(dc.crop_amount, '0.0')), '0.0') as crop_amount,sum(coalesce(dc.crop_quantity, 0)) as crop_quantity,
       coalesce(sum(coalesce(dc.assay_amount, '0.0')), '0.0') as assay_amount,sum(coalesce(dc.assay_quantity, 0)) as assay_quantity,
       coalesce(sum(coalesce(dc.drugs_amount, '0.0')), '0.0') as drugs_amount,sum(coalesce(dc.drugs_quantity, 0)) as drugs_quantity,
       coalesce(sum(coalesce(dc.regist_amount, '0.0')), '0.0') as regist_amount,sum(coalesce(dc.regist_quantity, 0)) as regist_quantity,
       coalesce(sum(coalesce(dc.others_amount, '0.0')), '0.0') as others_amount,sum(coalesce(dc.others_quantity, 0)) as others_quantity,
       coalesce(sum(surgery_amount + repair_amount + orthod_amount + radiation_amount + teethcln_amount + crop_amount + assay_amount + drugs_amount + regist_amount + others_amount), '0.0') as totalRowAmount,
       coalesce(sum(surgery_quantity + repair_quantity + orthod_quantity + radiation_quantity + teethcln_quantity + crop_quantity + assay_quantity + drugs_quantity + regist_quantity + others_quantity), 0) as totalRowQuantity 
from d_personal_report_view dc
where 1=1 
group by dc.part_hospital_id,dc.department_id,dc.doctor_id 
order by dc.part_hospital_id,dc.doctor_id
limit 10 offset 0) as t
   where c.part_hospital_id=t.part_hospital_id and c.department_id=t.department_id and c.doctor_id=t.doctor_id order by t.part_hospital_id,t.department_id,t.doctor_id;

一.分析
看其结果是要取10条不同的数据做报表展示,用了同一张表做了两次关联,没有涉及查询条件,索引效果不大。分析这个SQL好不好,先看一下执行计划


二.优化过程
1.取消两张表的关联,只取一次查询 优化后的SQL:

select  dc.doctor_name,dc.department_name,dc.hospital_name,dc.part_hospital_name,dc.part_hospital_id,dc.department_id,dc.doctor_id,
       coalesce(sum(coalesce(dc.surgery_amount, '0.0')), '0.0') as surgery_amount,sum(coalesce(dc.surgery_quantity, 0)) as surgery_quantity,
       coalesce(sum(coalesce(dc.repair_amount, '0.0')), '0.0') as repair_amount,sum(coalesce(dc.repair_quantity, 0)) as repair_quantity,
       coalesce(sum(coalesce(dc.orthod_amount, '0.0')), '0.0') as orthod_amount,sum(coalesce(dc.orthod_quantity, 0)) as orthod_quantity,
       coalesce(sum(coalesce(dc.radiation_amount, '0.0')), '0.0') as radiation_amount,sum(coalesce(dc.radiation_quantity, 0)) as radiation_quantity,
       coalesce(sum(coalesce(dc.teethcln_amount, '0.0')), '0.0') as teethcln_amount,sum(coalesce(dc.teethcln_quantity, 0)) as teethcln_quantity,
       coalesce(sum(coalesce(dc.crop_amount, '0.0')), '0.0') as crop_amount,sum(coalesce(dc.crop_quantity, 0)) as crop_quantity,
       coalesce(sum(coalesce(dc.assay_amount, '0.0')), '0.0') as assay_amount,sum(coalesce(dc.assay_quantity, 0)) as assay_quantity,
       coalesce(sum(coalesce(dc.drugs_amount, '0.0')), '0.0') as drugs_amount,sum(coalesce(dc.drugs_quantity, 0)) as drugs_quantity,
       coalesce(sum(coalesce(dc.regist_amount, '0.0')), '0.0') as regist_amount,sum(coalesce(dc.regist_quantity, 0)) as regist_quantity,
       coalesce(sum(coalesce(dc.others_amount, '0.0')), '0.0') as others_amount,sum(coalesce(dc.others_quantity, 0)) as others_quantity,
       coalesce(sum(surgery_amount + repair_amount + orthod_amount + radiation_amount + teethcln_amount + crop_amount + assay_amount + drugs_amount + regist_amount + others_amount), '0.0') as totalRowAmount,
       coalesce(sum(surgery_quantity + repair_quantity + orthod_quantity + radiation_quantity + teethcln_quantity + crop_quantity + assay_quantity + drugs_quantity + regist_quantity + others_quantity), 0) as totalRowQuantity 
from d_personal_report_view dc
where 1=1 
group by dc.doctor_name,dc.department_name,dc.hospital_name,dc.part_hospital_name,dc.part_hospital_id,dc.department_id,dc.doctor_id 
order by dc.part_hospital_id,dc.doctor_id
limit 10 offset 0

查看其执行计划:


2.其执行计划中的cost还是有点高,再次优化 postgresql中的sum会自动把null值替换成0,并考虑到不会存入''或者' '等空值,故可以去掉里面的coalesce转换函数。
优化后的SQL:

select  dc.doctor_name,dc.department_name,dc.hospital_name,dc.part_hospital_name,dc.part_hospital_id,dc.department_id,dc.doctor_id,
sum(dc.surgery_amount) as surgery_amount,sum(dc.surgery_quantity) as surgery_quantity,
sum(dc.repair_amount) as repair_amount,sum(dc.repair_quantity) as repair_quantity,
sum(dc.orthod_amount) as orthod_amount,sum(dc.orthod_quantity) as orthod_quantity,
sum(dc.radiation_amount) as radiation_amount,sum(dc.radiation_quantity) as radiation_quantity,
sum(dc.teethcln_amount) as teethcln_amount,sum(dc.teethcln_quantity) as teethcln_quantity,
sum(dc.crop_amount) as crop_amount,sum(dc.crop_quantity) as crop_quantity,
sum(dc.assay_amount) as assay_amount,sum(dc.assay_quantity) as assay_quantity,
sum(dc.drugs_amount) as drugs_amount,sum(dc.drugs_quantity) as drugs_quantity,
sum(dc.regist_amount) as regist_amount,sum(dc.regist_quantity) as regist_quantity,
sum(dc.others_amount) as others_amount,sum(dc.others_quantity) as others_quantity,
sum(surgery_amount + repair_amount + orthod_amount + radiation_amount + teethcln_amount + crop_amount + assay_amount + drugs_amount + regist_amount + others_amount) as totalRowAmount,
sum(surgery_quantity + repair_quantity + orthod_quantity + radiation_quantity + teethcln_quantity + crop_quantity + assay_quantity + drugs_quantity + regist_quantity + others_quantity) as totalRowQuantity 
from d_personal_report_view dc
where 1=1 
group by dc.doctor_name,dc.department_name,dc.hospital_name,dc.part_hospital_name,dc.part_hospital_id,dc.department_id,dc.doctor_id 
order by dc.part_hospital_id,dc.doctor_id
limit 10 offset 0

查看执行计划:

结果是一样的,从侧面也反映出PostgreSQL的这个内部转换函数消耗不是很大

3.缩小查询基数,提升查询性能 之前的查询语句都是先全表扫描基础表并按规则排序和分组,最后才取10条,其实从另一个角度可以先取一定数量的数据,不用全表,但是一般可满足数据结果的不同,然后再取10条出来。

优化后的SQL

select dc.doctor_name,dc.department_name,dc.hospital_name,dc.part_hospital_name,dc.part_hospital_id,dc.department_id,dc.doctor_id,
sum(dc.surgery_amount) as surgery_amount,sum(dc.surgery_quantity) as surgery_quantity,
sum(dc.repair_amount) as repair_amount,sum(dc.repair_quantity) as repair_quantity,
sum(dc.orthod_amount) as orthod_amount,sum(dc.orthod_quantity) as orthod_quantity,
sum(dc.radiation_amount) as radiation_amount,sum(dc.radiation_quantity) as radiation_quantity,
sum(dc.teethcln_amount) as teethcln_amount,sum(dc.teethcln_quantity) as teethcln_quantity,
sum(dc.crop_amount) as crop_amount,sum(dc.crop_quantity) as crop_quantity,
sum(dc.assay_amount) as assay_amount,sum(dc.assay_quantity) as assay_quantity,
sum(dc.drugs_amount) as drugs_amount,sum(dc.drugs_quantity) as drugs_quantity,
sum(dc.regist_amount) as regist_amount,sum(dc.regist_quantity) as regist_quantity,
sum(dc.others_amount) as others_amount,sum(dc.others_quantity) as others_quantity,
sum(surgery_amount + repair_amount + orthod_amount + radiation_amount + teethcln_amount + crop_amount + assay_amount + drugs_amount + regist_amount + others_amount) as totalRowAmount,
sum(surgery_quantity + repair_quantity + orthod_quantity + radiation_quantity + teethcln_quantity + crop_quantity + assay_quantity + drugs_quantity + regist_quantity + others_quantity) as totalRowQuantity 
from (select * from d_personal_report_view limit 1000) as dc
group by dc.doctor_name,dc.department_name,dc.hospital_name,dc.part_hospital_name,dc.part_hospital_id,dc.department_id,dc.doctor_id 
order by dc.part_hospital_id,dc.doctor_id
limit 10 offset 0

此时再来看执行计划: 效果有了显著的提升,特别是遇到基础表快速膨胀的时候,响应时间也快了很多。

此方法对其他DB也适用,但是要与业务结果相匹配。

相关内容