Evaluation Methodology

The technical and biologically inspired measures, which are defined in the following paragraphs, take values in the [0,1] interval, with higher values corresponding to better performance.
 

The detection accuracy measure (DET) assesses how accurately each given object has been identified by comparing the set of objects computed by an evaluated algorithm with the reference set of objects given by the gold tracking truth. Numerically, DET is defined as a normalized Acyclic Oriented Graph Matching measure for detection (AOGM-D):

DET = 1- min(AOGM-D, AOGM-D0)/AOGM-D0

where AOGM-D is the cost of transforming the computed set of nodes into the reference one, and AOGM-D0 is the cost of creating the reference set of nodes from scratch (i.e., it is AOGM-D for empty detection results). The minimum operator in the numerator prevents the DET value from being negative in the case when it is cheaper to create the reference set of nodes from scratch than to transform the computed set of nodes into the reference one.
 

The segmentation accuracy measure (SEG) assesses how well the segmented regions match the actual cell or nucleus boundaries by comparing the cell instance segmentation masks computed by an evaluated algorithm with the reference masks given by the gold segmentation truth. Numerically, SEG is based on the Jaccard similarity index as detailed in SEG.pdf.
 

The tracking accuracy measure (TRA) assesses how accurately each given object has been identified and followed in successive frames by comparing the acyclic oriented graph computed by an evaluated algorithm with the reference graph given by the gold tracking truth. Numerically, TRA is defined as a normalized Acyclic Oriented Graph Matching (AOGM) measure:

TRA = 1- min(AOGM, AOGM0)/AOGM0

where AOGM0 is the AOGM value required for creating the reference graph from scratch (i.e., it is the AOGM value for empty tracking results). The minimum operator in the numerator prevents the TRA value from being negative in the case when it is cheaper to create the reference graph from scratch than to transform the computed graph into the reference graph.
 

The linking accuracy measure (LNK) assesses how accurately each given object has been followed in successive frames by comparing the acyclic oriented graph computed by an evaluated algorithm with the reference graph given by the gold tracking truth. To this end, LNK creates two intermediate acyclic oriented graphs with synchronized sets of vertices and zero penalty assigned to the vertex set synchronization, as detailed in Cell Linking Benchmark.pdf. Numerically, LNK is defined as a normalized Acyclic Oriented Graph Matching measure for association (AOGM-A):

LNK = 1- min(AOGM-A, AOGM-A0)/AOGM-A0

where AOGM-A0 is the AOGM-A value required for creating the intermediate reference graph from its vertices only. The minimum operator in the numerator prevents the LNK value from being negative in the case when it is cheaper to create the intermediate reference graph from its vertices only than to transform the intermediate computed graph into the intermediate reference graph.
 

The biological accuracy measure (BIO) averages biologically inspired measures applicable to a given dataset:

  • Complete Tracks (CT) measures the fraction of reference cell tracks that a given algorithm can reconstruct entirely from the frame in which they appear to the frame in which they disappear. It is especially relevant when a perfect reconstruction of the cell lineages is required.
  • Track Fractions (TF) averages, for all detected tracks, the fraction of the longest continuously matching algorithm-generated tracklet with respect to the reference track. Intuitively, this can be interpreted as the fraction of an average cell’s trajectory that an algorithm reconstructs correctly once the cell has been detected.
  • Branching correctness (BC(i)) measures the efficiency of a given algorithm at detecting division events with a tolerance of i frames.
  • Cell cycle accuracy (CCA) measures how accurate an algorithm is at correctly reconstructing the length of cell cycles (i.e., the time between two consecutive divisions).

Command-line software packages that implement the DET, SEG and TRA measures are made publicly available, along with the instructions required to run the packages. These packages are used for the official evaluation of the algorithms by the challenge organizers and can be used by the participants to evaluate and tune their algorithms too.
 

To allow a direct comparison of the algorithms included in the Cell Segmentation Benchmark, the overall performance measure (OPCSB) is computed by averaging the corresponding DET and SEG values:

OPCSB = 0.5⋅(DET + SEG).

To allow a direct comparison of the algorithms included in the Cell Tracking Benchmark, the overall performance measure (OPCTB) is computed by averaging the corresponding SEG and TRA values:

OPCTB = 0.5⋅(SEG + TRA).

To allow a direct comparison of the algorithms included in the Cell Linking Benchmark, the overall performance measure (OPCLB) is computed by averaging the corresponding LNK and BIO values:

OPCLB = 0.5⋅(LNK + BIO).