Metrics
Overview
operate error
HA
- 每分钟dm-master启动领导组件的数量
a. 含义 : 每分钟dm-master尝试启动leader组件的个数 b. 计算 :
task
load dump files
binlog replication
remaining time to sync
a. Meaning :
b. Colculation : it's calculated by expr below:yamlremainingSize = remainingSeconds = --------------------------------------------------------------- bytesPerSec = (totalBinlogSize - lastBinlogSize) / seconds
replicate lag
a. Meaning : The latency time it takes to replicate the binlog from master to Syncer (in seconds)
b. Colculation : In this func updateReplicationLagMetric, the metric is updated by ticking per 100ms. thelag
is calculated in an expression oftime.Now().Unix() - s.tsOffset.Load() - headerTS
. whichtsOffset
represents time range between upstream and syncer, DM's timestamp - MySQL's timestamp, andheaderTS
is minior timestamp,which is binlogEventHeader Timestamp
parsed by go-mysql-org/go-mysql/replication, of every DM worker MySQL.process exits with error
a. Meaning : The binlog replication unit process encounters an error within the DM-worker and exits
b. Colculation : 1binlog file gap between master and syncer
a. Meaning : The number of binlog files in binlog replication unit that are behind the master.
b. Colculation : Formaster
,binlog file gap between relay and syncer
binlog event QPS
skipped binlog event QPS
read binlog event duration
transform binlog event duration
a. Meaning : The time it takes binlog replication unit to parse and transform the binlog into SQL statements (in seconds)
b. Colculation : At func successFunc, every job will have calculated the duration in DDLWorker and DMLWorker executing func executeBatchJobs, since raw binlog had tranformed into Syncer dealing loop function. And It has 90%, 95% and 99% quantile curve.dispatch binlog event duration
transaction execution latency
binlog event size
a. Meaning : The size of a single binlog event that the binlog replication unit reads from relay log or upstream master.
b. Colculation : At here, Syncer'll record every binlog event_size in event header from binlogstream. So, in grafana, DM uses a type of histogram to draw a quantile curve, including 90%, 95%, 99%.DML queue remain length
a. Meaning : The remain length of DML job queues, which havecausality_input
,compactor_input
,dml_worker_input
,q_number
,number is calculated inqueueID%defaultBucketCount
.
b. Colculation : Forcausality_input
: Causality provides a simple mechanism to improve the concurrency of SQLs execution under the premise of ensuring correctness, which groups sqls that maybe contain causal relationships, and syncer executes them linearly, more details. So, it records the number of causality component is keeping about rows. Fordml_worker_input
: which means how many jobs were sent to DMWorker. Forcompactor_input
: it'll equal to the function of causality_input if you config this feature. Forq_number
: all DMLs at here will be distributed into different queues(default 8) and when they are executed in DMLWorker's executeJobs, it'll be recorded by the queue name.