MySQL遷移OpenGauss性能優化實戰：一條SQL從570秒到5毫秒的蛻變

原創 szrsu 2025-12-30

619

一、問題背景

在從MySQL遷移至OpenGauss數據庫的過程中，遇到了一個嚴重的性能問題。一條在MySQL中運行正常的關聯查詢語句，在OpenGauss中執行時間達到了570秒，嚴重影響了系統性能。本文將詳細記錄優化過程，分享從問題診斷到最終解決的過程。

二、環境準備

2.1 表結構定義

-- 創建 t1 表（文件信息表）
CREATE TABLE t1 (
    object_id character varying(255) NOT NULL,
    file_name character varying(255),
    md5 character varying(255),
    size bigint,
    valid smallint DEFAULT 1,
    PRIMARY KEY (object_id)
);

-- 創建索引
CREATE INDEX t1_idx_md5 ON t1 USING btree (md5);
CREATE INDEX t1_idx_valid ON t1 USING btree (valid);

-- 創建 t2 表（版本關聯表）
CREATE TABLE t2 (
    id bigserial PRIMARY KEY,
    version_id bigint NOT NULL,
    valid smallint DEFAULT 1,
    asset_version_format_id integer DEFAULT 0,
    object_id varbinary(255)
);

2.2 測試數據生成

-- 插入 t1 數據（約10萬行有效數據）
INSERT INTO t1 (object_id, file_name, md5, size, valid)
SELECT 
    'file_' || i,
    'file_' || i || '.jpg',
    md5(random()::text),
    (random() * 100000)::bigint,
    1
FROM generate_series(1, 100000) i;

-- 插入100條無效數據
INSERT INTO t1 (object_id, file_name, md5, size, valid)
SELECT 
    'invalid_' || i,
    'invalid_' || i || '.jpg',
    md5(random()::text),
    (random() * 100000)::bigint,
    0
FROM generate_series(1, 100) i;

-- 插入 t2 數據（100條特定版本數據）
INSERT INTO t2 (version_id, valid, asset_version_format_id, object_id)
SELECT 
    48725,
    1,
    0,
    ('file_' || (i % 1000 + 1))
FROM generate_series(1, 100) i;

-- 插入其他版本數據（10萬條）
INSERT INTO t2 (version_id, valid, asset_version_format_id, object_id)
SELECT 
    (random() * 100000)::bigint,
    1,
    floor(random() * 5)::integer,
    ('file_' || (random() * 1000)::integer)::bytea
FROM generate_series(1, 100000) i;

三、性能問題初現

3.1 原始查詢語句

SELECT t1.object_id, t2.object_id, t1.size, t1.md5, t1.file_name
FROM t1 t1
JOIN t2 t2 ON t2.object_id = t1.object_id
WHERE t2.version_id = 48725
  AND t2.valid = 1
  AND t2.asset_version_format_id = 0
  AND t1.valid = 1;

執行結果：耗時約570秒

3.2 數據分布分析

-- 查看t2表符合條件的數據量
select count(*) from t2 
WHERE t2.version_id = 48725
  AND t2.valid = 1
  AND t2.asset_version_format_id = 0;

-- 結果：100行（僅占總數據量的0.1%）

-- t2表總數據量
select count(*) from t2;

-- 結果：100100行

四、執行計劃深度分析

4.1 初始執行計劃

EXPLAIN 
SELECT t1.object_id, t2.object_id, t1.size, t1.md5, t1.file_name
FROM t1 t1
JOIN t2 t2 ON t2.object_id = t1.object_id
WHERE t2.version_id = 48725
  AND t2.valid = 1
  AND t2.asset_version_format_id = 0
  AND t1.valid = 1;

執行計劃輸出：

                                           QUERY PLAN                                           
------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..2393.45 rows=1 width=1588)
   Join Filter: (t2.object_id = (t1.object_id)::text)
   ->  Index Scan using t1_idx_valid on t1  (cost=0.00..144.84 rows=34 width=1556)
         Index Cond: (valid = 1)
   ->  Materialize  (cost=0.00..2239.68 rows=1 width=32)
         ->  Seq Scan on t2  (cost=0.00..2239.68 rows=1 width=32)
               Filter: ((version_id = 48725) AND (valid = 1) AND (asset_version_format_id = 0))
(7 rows)

4.2 問題診斷

缺失關鍵索引：t2表缺少針對version_id的過濾索引
連接順序不當：執行計劃選擇t1作為驅動表，但實際t2過濾后數據量更小

五、優化步驟實施

5.1 第一步：創建缺失索引

-- 為t1表的連接字段創建索引
create index idx_t1_objectid on t1 (object_id);

-- 為t2表創建復合索引（覆蓋version_id和object_id）
create index idx_t2_version_objectid on t2 (version_id, object_id);

5.2 第二步：更新統計信息

-- 收集最新的統計信息
analyze table t1;
analyze table t2;

5.3 第三步：分析優化后執行計劃

添加索引后的執行計劃：

                                             QUERY PLAN                                             
----------------------------------------------------------------------------------------------------
 Nested Loop  (cost=5.18..659088.83 rows=40 width=73)
   Join Filter: (t2.object_id = (t1.object_id)::text)
   ->  Seq Scan on t1  (cost=0.00..2585.25 rows=99987 width=65)
         Filter: (valid = 1)
   ->  Materialize  (cost=5.18..338.96 rows=25 width=8)
         ->  Bitmap Heap Scan on t2  (cost=5.18..338.83 rows=25 width=8)
               Recheck Cond: (version_id = 48725)
               Filter: ((valid = 1) AND (asset_version_format_id = 0))
               ->  Bitmap Index Scan on idx_t2_version_objectid  (cost=0.00..5.17 rows=123 width=0)
                     Index Cond: (version_id = 48725)

t2過濾后數據較少，由它當做驅動表才是正確的。

使用hint提示優化器調整連接順序：

EXPLAIN 
SELECT /*+ leading((t2 t1)) */ t1.object_id, t2.object_id, t1.size, t1.md5, t1.file_name
FROM t1 
JOIN t2 ON t2.object_id = t1.object_id
WHERE t2.version_id = 48725
  AND t2.valid = 1
  AND t2.asset_version_format_id = 0
  AND t1.valid = 1;

調整后執行計劃如下：

                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..710741.89 rows=40 width=73)
   Join Filter: (t2.object_id = (t1.object_id)::text)
   ->  Index Scan using idx_t2_version_objectid on t2  (cost=0.00..466.99 rows=25 width=8)
         Index Cond: (version_id = 48725)
         Filter: ((valid = 1) AND (asset_version_format_id = 0))
   ->  Materialize  (cost=0.00..5136.18 rows=99987 width=65)
         ->  Seq Scan on t1  (cost=0.00..2585.25 rows=99987 width=65)
               Filter: (valid = 1)

執行計劃有所改善，但仍不是最優。

問題依舊存在：雖然t2使用了索引，但連接條件中的隱式類型轉換問題，t2.object_id為varbinary類型，t1.object_id為varchar類型，連接時存在隱式類型轉換，導致t1還是使用全表掃描。

六、終極優化方案

6.1 根本問題解決：消除類型轉換

消除類型轉換有多種方式，可以將數據類型都調整成一樣（t2.object_id可以由varbinary改成varchar類型)，也可以改寫sql，顯示處理類型轉換：

-- 改寫SQL，顯式處理類型轉換
EXPLAIN 
SELECT t1.object_id, t2.object_id, t1.size, t1.md5, t1.file_name
FROM t2 
JOIN t1 ON convert_from(t2.object_id, 'UTF8') = t1.object_id
WHERE t2.version_id = 48725
  AND t2.valid = 1
  AND t2.asset_version_format_id = 0
  AND t1.valid = 1;

優化后的執行計劃：

                                          QUERY PLAN                                          
----------------------------------------------------------------------------------------------
 Nested Loop  (cost=5.18..546.02 rows=40 width=73)
   ->  Bitmap Heap Scan on t2  (cost=5.18..338.83 rows=25 width=8)
         Recheck Cond: (version_id = 48725)
         Filter: ((valid = 1) AND (asset_version_format_id = 0))
         ->  Bitmap Index Scan on idx_t2_version_objectid  (cost=0.00..5.17 rows=123 width=0)
               Index Cond: (version_id = 48725)
   ->  Index Scan using idx_t1_objectid on t1  (cost=0.00..8.28 rows=1 width=65)
         Index Cond: ((object_id)::text = convert_from((t2.object_id)::bytea, 'UTF8'::name))
         Filter: (valid = 1)

6.2 最終執行效果

mydb[15400]=#explain analyze SELECT t1.object_id, t2.object_id, t1.size, t1.md5, t1.file_name
mydb-# FROM t2 
mydb-# JOIN t1 ON convert_from(t2.object_id, 'UTF8') = t1.object_id
mydb-# WHERE t2.version_id = 48725
mydb-#   AND t2.valid = 1
mydb-#   AND t2.asset_version_format_id = 0
mydb-#   AND t1.valid = 1;
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=5.18..546.02 rows=40 width=73) (actual time=0.350..4.377 rows=100 loops=1)
   ->  Bitmap Heap Scan on t2  (cost=5.18..338.83 rows=25 width=8) (actual time=0.183..0.367 rows=100 loops=1)
         Recheck Cond: (version_id = 48725)
         Filter: ((valid = 1) AND (asset_version_format_id = 0))
         Rows Removed by Filter: 2
         Heap Blocks: exact=3
         ->  Bitmap Index Scan on idx_t2_version_objectid  (cost=0.00..5.17 rows=123 width=0) (actual time=0.097..0.097 rows=102 loops=1)
               Index Cond: (version_id = 48725)
   ->  Index Scan using idx_t1_objectid on t1  (cost=0.00..8.28 rows=1 width=65) (actual time=3.175..3.326 rows=100 loops=100)
         Index Cond: ((object_id)::text = convert_from((t2.object_id)::bytea, 'UTF8'::name))
         Filter: (valid = 1)
 Total runtime: 4.823 ms
(12 rows)

最終執行時間：5 ms左右

七、總結

本次優化案例展示了數據庫遷移過程中可能遇到的性能陷阱，通過深入分析執行計劃，識別出隱式類型轉換這一根本問題，并采用針對性的優化策略，最終實現了從570秒到5毫秒的性能飛躍。

優化階段	執行時間	性能提升	關鍵改進點
原始SQL	570,000 ms	基準	-
僅添加索引	未顯著改善	不明顯	添加缺失索引
調整連接順序	有所改善	一般	使用hint調整連接順序
最終優化SQL	5 ms	114000倍	消除隱式類型轉換

核心優化要點：

數據類型一致性：確保連接條件兩邊的數據類型完全一致
顯式類型轉換：使用convert_from()等函數顯式處理數據
復合索引設計：針對高頻查詢條件設計覆蓋索引
統計信息更新：數據變化后及時更新統計信息
執行計劃分析：定期分析關鍵SQL的執行計劃

墨力計劃數據庫實操性能優化 mysql opengauss

最后修改時間：2025-12-30 10:00:51

「喜歡這篇文章，您的關注和贊賞是給作者最好的鼓勵」

關注作者

文章被以下合輯收錄

墨天輪數據庫性能優化征文-優質文章合集（共37篇）

本合輯匯總了「實操看我的」數據庫性能優化篇征文活動中“最佳實操獎” 文章及部分高閱讀量合格文章，歡迎參考~

特色一级强游戏,海奥华预言免费阅读,51漫画兑换码,美女裸体无遮挡永久免费观看网站,lubuntu线路检测入口