Jan-Blog-CN | Jan-Blog-CN

First of all

first of all, in psysical period, preparePossibleProperties will get all possible order properties(column fields), which is only used for join and aggregation. Like group by a,b,c, into one slice from different level childs.

How to calculate physical plan

DoOptimize is the core connection from logicalOptimize to physicalOptimize, which optimizes a logical plan to a physical plan.
Function findBestTask ,in physicalOptimize, will find out the best tasks from logical plan, and converts the logical plan to the physical plan, which's a new interface.
Until 2022-11-30 TiDB master branch, TiDB optizmizer gets plan cost in the order of DoOptimize -> getPlanCost -> getPlanCostVer1/getPlanCostVer2.Actually, the method getPlanCostVer1/getPlanCostVer2 of interface PhysicalPlan is use get psysical every operand cost. I mean, by their different implentation, you could figure out how it's calculated.
In here, you'll get that It figures out child cost one by one.

Why go PlanCostVer1 to PlanCostVer2

This PR indeciates sometimes optimizer cannot discern which is the best datasource, so a new cost formular for Selection/TableScan/IndexScan has been merged. which are here below:

Selection : rowsnum-filterscpu-factor, which Tiflash has a specific cpu-factor
TableScan : rows*log(row-size)*scan-factor, which Tiflash has a specific cpu-factor
indexScan : rows*log(row-size)*scan-factor

Formula map of operand cost

In the map below, you'll find a lot of factor and can get them on [Formula map of different factors](## Formula map of different factors ).

TaskType	version	cost calculated expression	code addr
PhysicalSelection	2	cost = child-cost + filter-cost	here
PhysicalProjection	2	cost = (child-cost + (proj-cost = rows * len(expressions) * cpu-factor)) / projection-concurrency	here
PhysicalIndexScan	2	cost = rows * log2(row-size) * scan-factor	here
PhysicalTableScan	2	cost = rows * log2(row-size) * scan-factor	here
PhysicalIndexReader	2	cost = (child-cost + net-cost + seek-cost) / concurrency	here
PhysicalTableReader	2	cost = (child-cost + (rows * row-size * net-factor) + (num-tasks * seek-factor)) / concurrency	here
PhysicalIndexLookUpReader	2	cost = index-side-cost + (table-side-cost + double-read-cost) / double-read-concurrency	here
PhysicalIndexMergeReader	2	cost = cost = table-side-cost + sum(index-side-cost)	here
PhysicalSort	2	cost = child-cost + sort-cpu-cost + sort-mem-cost + sort-disk-cost	here
PhysicalTopN	2	cost = child-cost + topn-cpu-cost + topn-mem-cost	here
PhysicalStreamAgg	2	cost = child-cost + agg-cost + group-cost	here
PhysicalHashAgg	2	child-cost + (agg-cost + group-cost + hash-build-cost + hash-probe-cost) / concurrency	here
PhysicalMergeJoin	2	cost = left-child-cost + right-child-cost + filter-cost + group-cost
PhysicalHashJoin	2	cost = build-child-cost + probe-child-cost + build-hash-cost + build-filter-cost + (probe-filter-cost + probe-hash-cost) / concurrency	here
PhysicalIndexJoin	2	cost = build-child-cost + build-filter-cost + (probe-cost + probe-filter-cost) / concurrency probe-cost = probe-child-cost * build-rows / batchRatio	here
PhysicalIndexHashJoin	2	until 2022-11-30, It's the same as PhysicalIndexJoin	here
PhysicalIndexMergeJoin	2	until 2022-11-30, It's the same as IndexJoin	here
PhysicalApply	2	cost = uild-child-cost + build-filter-cost + probe-cost + probe-filter-cost	here
PhysicalUnionAll	2	cost = sum(child-cost) / concurrency	here
PhysicalExchangeReceiver	2	cost = child-cost + net-cost	here
PointGetPlan	2	cost = child-cost + net-cost	here
PointGetPlan	2	cost = child-cost + net-cost	here
BatchPointGetPlan	2	cost = seek-cost + net-cost	here

Formula map of different factors

function name	name	factor name
TiDBTemp	tidb_temp_table_factor	0
TiKVScan:	tikv_scan_factor	100
TiKVDescScan:	tikv_desc_scan_factor	150
TiFlashScan:	tiflash_scan_factor	5
TiDBCPU:	tidb_cpu_factor	30
TiKVCPU:	tikv_cpu_factor	30
TiFlashCPU:	tiflash_cpu_factor	5
TiDB2KVNet:	tidb_kv_net_factor	8
TiDB2FlashNet:	tidb_flash_net_factor	4
TiFlashMPPNet:	tiflash_mpp_net_factor	4
TiDBMem:	tidb_mem_factor	1
TiKVMem:	tikv_mem_factor	1
TiFlashMem:	tiflash_mem_factor	1
TiDBDisk:	tidb_disk_factor	1000
TiDBRequest:	tidb_request_factor	9500000

PhysicalSelection

If row-size belongs to the type of PointGet, the row-size is 1. And row-size of BatchPointGet will come from stats.
And filter-cost indicates cost of this step which uses to calculate and filter the result, which is row-size * float64(len(filters)) * (tidb_cpu_factor or tikv_cpu_factor or TiFlashCPU).

plan-cost =  child-cost + filter-cost(row-size * float64(len(filters)) * (tidb_cpu_factor or tikv_cpu_factor or TiFlashCPU))

PhysicalProjection

projection-concurrency can be modified by tidb_executor_concurrency
proj-cost is equal to row-size * len(expressions) * cpu-factor), which expressions is the number of column this step.

                 child-cost + row-size * len(expressions) * cpu-factor
plan-cost =   ------------------------------------------------------------
                          projection-concurrency

PhysicalIndexScan

rowSize(row-size) = AvgRowSize + [tablePrefix(1) + tableID(8) + indexPrefix(2) + indexID(8)] = AvgRowSize + 19.
BTW, the calculation of AvgRowSize is really complicated. if you wanna figure out how the result value is computed, please explore the code logic. And,here are the code addr.

cost = rows * log2(row-size) * scan-factor

PhysicalTableScan

rows is the Cardinality generated by stats;
The expression is equal to PhysicalIndexScan.

cost = rows * log2(row-size) * scan-factor

PhysicalIndexReader

Actually, the concurrency of this step is distSQLScanConcurrency(default 15) in system varliable, and can be changed by tidb_distsql_scan_concurrency.

           child-cost + rows * row-size * net-factor + num-tasks * seek-factor
cost =  -------------------------------------------------------------------------
                          tidb_distsql_scan_concurrency

PhysicalTableReader

equal to PhysicalIndexReader

           child-cost + rows * row-size * net-factor + num-tasks * seek-factor
cost =  -------------------------------------------------------------------------
                          tidb_distsql_scan_concurrency

PhysicalIndexLookUpReader

How to change the concurrency is tidb_index_lookup_concurrency and now which named tidb_executor_concurrency.
index-side-cost = (index-child-cost + index-net-cost + index-seek-cost) / dist-concurrency, which is same with IndexReader
table-side-cost = (table-child-cost + table-net-cost + table-seek-cost) / dist-concurrency, which is same with TableReader
double-read-cost = double-read-seek-cost + double-read-cpu-cost
double-read-seek-cost = double-read-tasks * seek-factor
double-read-cpu-cost = index-rows * cpu-factor
double-read-tasks = index-rows / batch-size * task-per-batch ,and task-per-batch is a magic number 40 now

                   (index-child-cost + index-net-cost + index-seek-cost) +                          -> index-side-cost
                    (table-child-cost + table-net-cost + table-seek-cost +                          -> table-side-cost
          ((index-rows / batch-size * task-per-batch(40)) * seek-factor + index-rows * cpu-factor)) -> double-read-cost
          -----------------------------------------------------------------------------------------
cost =                               dist-concurrency
      -------------------------------------------------------------------------------------------------
                                  double-read-concurrency

PhysicalIndexMergeReader

The cost is made up of index-side-cost and table-side-cost, and the formar's equal to IndexReader, and the latter's the same as TableReader.

          (table-child-cost + table-net-cost + table-seek-cost) + (index-child-cost + index-net-cost + index-seek-cost)
cost =  ----------------------------------------------------------------------------------------------------------------
                                                dist-concurrency

PhysicalSort

cost = child-cost + sort-cpu-cost + sort-mem-cost + sort-disk-cost
The Spill means that you're just going to have to use oomUseTmpStorage to save enough TiDB memor, When sort action happens.
a. if no spill, sort-mem-cost will be rows * row-size * mem-factor and, sort-disk-cost is 0;
b. if spill, sort-mem-cost is mem-quota * mem-factor and, sort-disk-cost is rows * row-size * disk-factor;
So, It's clear that when spill happend, sort-mem-cost and sort-disk-cost will change in an appropriate way.

cost = child-cost + (rows * log2(rows) * len(sort-items) * cpu-factor) + sort-mem-cost + sort-disk-cost

PhysicalTopN

child-cost + topn-cpu-cost + topn-mem-cost

cost = child-cost + (rows * log2(N) * len(sort-items) * cpu-factor) + (N * row-size * mem-factor)

PhysicalStreamAgg

cost = child-cost + agg-cost + group-cost

PhysicalHashAgg

Default value of tidb_opt_concurrency_factor is 5.

           child-cost + (agg-cost + group-cost + hash-build-cost + hash-probe-cost)
cost =    --------------------------------------------------------------------------------
                                tidb_opt_concurrency_factor

PhysicalMergeJoin

cost = left-child-cost + right-child-cost + filter-cost + group-cost

PhysicalHashJoin

            build-child-cost + probe-child-cost + build-hash-cost + 
cost =  build-filter-cost + (probe-filter-cost + probe-hash-cost) / concurrency

PhysicalIndexJoin & PhysicalIndexHashJoin & PhysicalIndexMergeJoin

They are equal to IndexJoin.

cost =  build-child-cost + build-filter-cost +                                     
       (probe-child-cost * build-rows / batchRatio + probe-filter-cost) / concurrency

PhysicalApply

cost = build-child-cost + build-filter-cost + probe-child-cost * build-rows + probe-filter-cost

PhysicalUnionAll

tidb_opt_concurrency_factor is the denominator below.

cost = sum(child-cost) / concurrency

PhysicalExchangeReceiver & PointGetPlan & PointGetPlan & BatchPointGetPlan

cost = child-cost + net-cost

First of all ​

How to calculate physical plan ​

Why go PlanCostVer1 to PlanCostVer2 ​

Formula map of operand cost ​

Formula map of different factors ​

PhysicalSelection ​

PhysicalProjection ​

PhysicalIndexScan ​

PhysicalTableScan ​

PhysicalIndexReader ​

PhysicalTableReader ​

PhysicalIndexLookUpReader ​

PhysicalIndexMergeReader ​

PhysicalSort ​

PhysicalTopN ​

PhysicalStreamAgg ​

PhysicalHashAgg ​

PhysicalMergeJoin ​

PhysicalHashJoin ​

PhysicalIndexJoin & PhysicalIndexHashJoin & PhysicalIndexMergeJoin ​

PhysicalApply ​

PhysicalUnionAll ​

PhysicalExchangeReceiver & PointGetPlan & PointGetPlan & BatchPointGetPlan ​