Skip to content
On this page

First of all

first of all, in psysical period, preparePossibleProperties will get all possible order properties(column fields), which is only used for join and aggregation. Like group by a,b,c, into one slice from different level childs.

How to calculate physical plan

  1. DoOptimize is the core connection from logicalOptimize to physicalOptimize, which optimizes a logical plan to a physical plan.
  2. Function findBestTask ,in physicalOptimize, will find out the best tasks from logical plan, and converts the logical plan to the physical plan, which's a new interface.
  3. Until 2022-11-30 TiDB master branch, TiDB optizmizer gets plan cost in the order of DoOptimize -> getPlanCost -> getPlanCostVer1/getPlanCostVer2.Actually, the method getPlanCostVer1/getPlanCostVer2 of interface PhysicalPlan is use get psysical every operand cost. I mean, by their different implentation, you could figure out how it's calculated.
  4. In here, you'll get that It figures out child cost one by one.

Why go PlanCostVer1 to PlanCostVer2

This PR indeciates sometimes optimizer cannot discern which is the best datasource, so a new cost formular for Selection/TableScan/IndexScan has been merged. which are here below:

  1. Selection : rowsnum-filterscpu-factor, which Tiflash has a specific cpu-factor
  2. TableScan : rows*log(row-size)*scan-factor, which Tiflash has a specific cpu-factor
  3. indexScan : rows*log(row-size)*scan-factor

Formula map of operand cost

  1. In the map below, you'll find a lot of factor and can get them on [Formula map of different factors](## Formula map of different factors ).
TaskTypeversioncost calculated expressioncode addr
PhysicalSelection2cost = child-cost + filter-costhere
PhysicalProjection2cost = (child-cost + (proj-cost = rows * len(expressions) * cpu-factor)) / projection-concurrencyhere
PhysicalIndexScan2cost = rows * log2(row-size) * scan-factorhere
PhysicalTableScan2cost = rows * log2(row-size) * scan-factorhere
PhysicalIndexReader2cost = (child-cost + net-cost + seek-cost) / concurrencyhere
PhysicalTableReader2cost = (child-cost + (rows * row-size * net-factor) + (num-tasks * seek-factor)) / concurrencyhere
PhysicalIndexLookUpReader2cost = index-side-cost + (table-side-cost + double-read-cost) / double-read-concurrencyhere
PhysicalIndexMergeReader2cost = cost = table-side-cost + sum(index-side-cost)here
PhysicalSort2cost = child-cost + sort-cpu-cost + sort-mem-cost + sort-disk-costhere
PhysicalTopN2cost = child-cost + topn-cpu-cost + topn-mem-costhere
PhysicalStreamAgg2cost = child-cost + agg-cost + group-costhere
PhysicalHashAgg2child-cost + (agg-cost + group-cost + hash-build-cost + hash-probe-cost) / concurrencyhere
PhysicalMergeJoin2cost = left-child-cost + right-child-cost + filter-cost + group-cost
PhysicalHashJoin2cost = build-child-cost + probe-child-cost + build-hash-cost + build-filter-cost + (probe-filter-cost + probe-hash-cost) / concurrencyhere
PhysicalIndexJoin2cost = build-child-cost + build-filter-cost + (probe-cost + probe-filter-cost) / concurrency probe-cost = probe-child-cost * build-rows / batchRatiohere
PhysicalIndexHashJoin2until 2022-11-30, It's the same as PhysicalIndexJoinhere
PhysicalIndexMergeJoin2until 2022-11-30, It's the same as IndexJoinhere
PhysicalApply2cost = uild-child-cost + build-filter-cost + probe-cost + probe-filter-costhere
PhysicalUnionAll2cost = sum(child-cost) / concurrencyhere
PhysicalExchangeReceiver2cost = child-cost + net-costhere
PointGetPlan2cost = child-cost + net-costhere
PointGetPlan2cost = child-cost + net-costhere
BatchPointGetPlan2cost = seek-cost + net-costhere

Formula map of different factors

function namenamefactor name
TiDBTemptidb_temp_table_factor0
TiKVScan:tikv_scan_factor100
TiKVDescScan:tikv_desc_scan_factor150
TiFlashScan:tiflash_scan_factor5
TiDBCPU:tidb_cpu_factor30
TiKVCPU:tikv_cpu_factor30
TiFlashCPU:tiflash_cpu_factor5
TiDB2KVNet:tidb_kv_net_factor8
TiDB2FlashNet:tidb_flash_net_factor4
TiFlashMPPNet:tiflash_mpp_net_factor4
TiDBMem:tidb_mem_factor1
TiKVMem:tikv_mem_factor1
TiFlashMem:tiflash_mem_factor1
TiDBDisk:tidb_disk_factor1000
TiDBRequest:tidb_request_factor9500000

PhysicalSelection

  1. If row-size belongs to the type of PointGet, the row-size is 1. And row-size of BatchPointGet will come from stats.
  2. And filter-cost indicates cost of this step which uses to calculate and filter the result, which is row-size * float64(len(filters)) * (tidb_cpu_factor or tikv_cpu_factor or TiFlashCPU).
plan-cost =  child-cost + filter-cost(row-size * float64(len(filters)) * (tidb_cpu_factor or tikv_cpu_factor or TiFlashCPU)) 

PhysicalProjection

  1. projection-concurrency can be modified by tidb_executor_concurrency
  2. proj-cost is equal to row-size * len(expressions) * cpu-factor), which expressions is the number of column this step.
                 child-cost + row-size * len(expressions) * cpu-factor
plan-cost =   ------------------------------------------------------------
                          projection-concurrency

PhysicalIndexScan

  1. rowSize(row-size) = AvgRowSize + [tablePrefix(1) + tableID(8) + indexPrefix(2) + indexID(8)] = AvgRowSize + 19.
  2. BTW, the calculation of AvgRowSize is really complicated. if you wanna figure out how the result value is computed, please explore the code logic. And,here are the code addr.
cost = rows * log2(row-size) * scan-factor

PhysicalTableScan

  1. rows is the Cardinality generated by stats;
  2. The expression is equal to PhysicalIndexScan.
cost = rows * log2(row-size) * scan-factor

PhysicalIndexReader

  1. Actually, the concurrency of this step is distSQLScanConcurrency(default 15) in system varliable, and can be changed by tidb_distsql_scan_concurrency.
           child-cost + rows * row-size * net-factor + num-tasks * seek-factor
cost =  -------------------------------------------------------------------------
                          tidb_distsql_scan_concurrency

PhysicalTableReader

  1. equal to PhysicalIndexReader
           child-cost + rows * row-size * net-factor + num-tasks * seek-factor
cost =  -------------------------------------------------------------------------
                          tidb_distsql_scan_concurrency

PhysicalIndexLookUpReader

  1. How to change the concurrency is tidb_index_lookup_concurrency and now which named tidb_executor_concurrency.
  2. index-side-cost = (index-child-cost + index-net-cost + index-seek-cost) / dist-concurrency, which is same with IndexReader
  3. table-side-cost = (table-child-cost + table-net-cost + table-seek-cost) / dist-concurrency, which is same with TableReader
  4. double-read-cost = double-read-seek-cost + double-read-cpu-cost
  5. double-read-seek-cost = double-read-tasks * seek-factor
  6. double-read-cpu-cost = index-rows * cpu-factor
  7. double-read-tasks = index-rows / batch-size * task-per-batch ,and task-per-batch is a magic number 40 now
                   (index-child-cost + index-net-cost + index-seek-cost) +                          -> index-side-cost
                    (table-child-cost + table-net-cost + table-seek-cost +                          -> table-side-cost
          ((index-rows / batch-size * task-per-batch(40)) * seek-factor + index-rows * cpu-factor)) -> double-read-cost
          -----------------------------------------------------------------------------------------
cost =                               dist-concurrency
      -------------------------------------------------------------------------------------------------
                                  double-read-concurrency

PhysicalIndexMergeReader

  1. The cost is made up of index-side-cost and table-side-cost, and the formar's equal to IndexReader, and the latter's the same as TableReader.
          (table-child-cost + table-net-cost + table-seek-cost) + (index-child-cost + index-net-cost + index-seek-cost)
cost =  ----------------------------------------------------------------------------------------------------------------
                                                dist-concurrency 

PhysicalSort

  1. cost = child-cost + sort-cpu-cost + sort-mem-cost + sort-disk-cost
  2. The Spill means that you're just going to have to use oomUseTmpStorage to save enough TiDB memor, When sort action happens.
    a. if no spill, sort-mem-cost will be rows * row-size * mem-factor and, sort-disk-cost is 0;
    b. if spill, sort-mem-cost is mem-quota * mem-factor and, sort-disk-cost is rows * row-size * disk-factor;
  3. So, It's clear that when spill happend, sort-mem-cost and sort-disk-cost will change in an appropriate way.
cost = child-cost + (rows * log2(rows) * len(sort-items) * cpu-factor) + sort-mem-cost + sort-disk-cost

PhysicalTopN

  1. child-cost + topn-cpu-cost + topn-mem-cost
cost = child-cost + (rows * log2(N) * len(sort-items) * cpu-factor) + (N * row-size * mem-factor)

PhysicalStreamAgg

cost = child-cost + agg-cost + group-cost

PhysicalHashAgg

  1. Default value of tidb_opt_concurrency_factor is 5.
           child-cost + (agg-cost + group-cost + hash-build-cost + hash-probe-cost)
cost =    --------------------------------------------------------------------------------
                                tidb_opt_concurrency_factor

PhysicalMergeJoin

cost = left-child-cost + right-child-cost + filter-cost + group-cost

PhysicalHashJoin

            build-child-cost + probe-child-cost + build-hash-cost + 
cost =  build-filter-cost + (probe-filter-cost + probe-hash-cost) / concurrency

PhysicalIndexJoin & PhysicalIndexHashJoin & PhysicalIndexMergeJoin

  1. They are equal to IndexJoin.
cost =  build-child-cost + build-filter-cost +                                     
       (probe-child-cost * build-rows / batchRatio + probe-filter-cost) / concurrency

PhysicalApply

cost = build-child-cost + build-filter-cost + probe-child-cost * build-rows + probe-filter-cost

PhysicalUnionAll

  1. tidb_opt_concurrency_factor is the denominator below.
cost = sum(child-cost) / concurrency

PhysicalExchangeReceiver & PointGetPlan & PointGetPlan & BatchPointGetPlan

cost = child-cost + net-cost