Submission of Results

The registered participants compete over 2D+t and 3D+t datasets of their choice. An expected submission consists of one segmentation-and-tracking result per dataset, with the possibility of using different algorithms for different datasets and no limitations on the training data configurations used. Please note that all regular submissions to the Cell Tracking Benchmark are automatically treated as regular submissions to the Cell Segmentation Benchmark too. As only a subset of cells is evaluated for the Fluo-N3DL-DRO, Fluo-N3DL-TRIC, and Fluo-N3DL-TRIF datasets, the participants are encouraged to additionally submit complete segmentation-only results for these datasets, which will automatically be filtered out by our evaluation software and used for the Cell Segmentation Benchmark only.

The performance of a particular algorithm for a given dataset is primarily evaluated and ranked using the DET, SEG, TRA, OP_CSB, and OP_CTB measures. Furthermore, the biological performance of the algorithm, evaluated and ranked using the CT, TF, BC(i), and CCA measures, is provided as complementary information.

Apart from submitting segmentation-and-tracking results for the chosen datasets, the participants must provide command-line versions of their algorithms used to produce the submitted results, thus allowing the challenge organizers to validate all submitted results by rerunning the algorithms on the test datasets on their own. Please note that the provided pieces of software will be used for validation purposes only. For more detailed submission instructions, please check this document.

To make the evaluated submissions visible on the challenge website, including their complete scorings, the participants must provide descriptions of the algorithms used, including the details on parameter configurations chosen, the training data used, and the training protocols followed, and prepare their algorithms in a reusable form. Both requirements can temporarily be omitted due to the adherence to anonymous submission policies of some workshops, conferences, and journals, and met after the reason for the anonymity declines.

Cell Tracking Benchmark (generalizable submissions)

The registered participants compete over the set of 13 real datasets (eight 2D+t and five 3D+t ones) with complete gold tracking truth, and gold and silver segmentation truths available. An expected submission consists of a set of six segmentation-and-tracking results per dataset, created using the same approach with parameters/models optimized/trained using each of the six following training data configurations: gold segmentation truth per dataset, silver segmentation truth per dataset, a mixture of gold and silver segmentation truths per dataset, gold segmentation truths across all the 13 datasets, silver segmentation truths across all the 13 datasets, and a mixture of gold and silver segmentation truths across all the 13 datasets. Other than these training data configurations cannot be exploited. Please note that all generalizable submissions to the Cell Tracking Benchmark are automatically treated as generalizable submissions to the Cell Segmentation Benchmark too. Furthermore, the best generalizability results, in terms of the OP_CTB measure, among the six specified training data configuations are cherry-picked for the 13 included datasets, automatically transferred as a regular submission to the Cell Tracking Benchmark and the Cell Segmentation Benchmark, and conditionally ranked there by preferring results with higher OP_CTB scores in case of the existing pairs of regular and generalizability results of the same algorithm for one dataset.

The performance of a particular algorithm for a given dataset and training data configuration is primarily evaluated using the DET, SEG, TRA, OP_CSB, and OP_CTB measures. Furthermore, the biological performance of the algorithm, evaluated using the CT, TF, BC(i), and CCA measures, is provided as complementary information. The overall, measure-specific performance of the algorithm used for its ranking is then obtained by averaging its measure-specific performance scores over all the included datasets and training data configurations.

Apart from submitting a set of 78 segmentation-and-tracking results for the 13 included datasets, the participants must provide command-line versions of their algorithms used to produce the submitted results, thus allowing the challenge organizers to validate all submitted results by rerunning the algorithms on the test datasets on their own. Please note that the provided pieces of software will be used for validation purposes only. The submission instructions are the same as for regular submissions to the Cell Tracking Benchmark, with the exception of output subfolder and entry file names that reflect the training data configuration used. For more details, please check the last section of this document.

To make the evaluated submissions visible on the challenge website, including their complete scorings, the participants must provide descriptions of the algorithms used, including the details on parameter configurations chosen and the training protocols followed, and prepare their algorithms in a reusable form. Both requirements can temporarily be omitted due to the adherence to anonymous submission policies of some workshops, conferences, and journals, and met after the reason for the anonymity declines.

Cell Segmentation Benchmark (regular submissions)

The registered participants compete over 2D+t and 3D+t datasets of their choice. An expected submission consists of one segmentation-only result per dataset, with the possibility of using different algorithms for different datasets and no limitations on the training data configurations used. As only a subset of cells is evaluated for the Fluo-N3DL-DRO, Fluo-N3DL-TRIC, and Fluo-N3DL-TRIF datasets, the participants are encouraged to submit complete segmentation-only results for these datasets, which will automatically be filtered out by our evaluation software.

The performance of a particular algorithm for a given dataset is evaluated and ranked using the DET, SEG, and OP_CSB measures.

Apart from submitting segmentation-only results for the chosen datasets, the participants must provide command-line versions of their algorithms used to produce the submitted results, thus allowing the challenge organizers to validate all submitted results by rerunning the algorithms on the test datasets on their own. Please note that the provided pieces of software will be used for validation purposes only. For more detailed submission instructions, please check this document.

Cell Segmentation Benchmark (generalizable submissions)

The registered participants compete over the set of 13 real datasets (eight 2D+t and five 3D+t ones) with complete gold tracking truth, and gold and silver segmentation truths available. An expected submission consists of a set of six segmentation-only results per dataset, created using the same approach with parameters/models optimized/trained using each of the six following training data configurations: gold segmentation truth per dataset, silver segmentation truth per dataset, a mixture of gold and silver segmentation truths per dataset, gold segmentation truths across all the 13 datasets, silver segmentation truths across all the 13 datasets, and a mixture of gold and silver segmentation truths across all the 13 datasets. Other than these training data configurations cannot be exploited. Please note that the best generalizability results, in terms of the OP_CSB measure, among the six specified training data configuations are cherry-picked for the 13 included datasets, automatically transferred as a regular submission to the Cell Segmentation Benchmark, and conditionally ranked there by preferring results with higher OP_CSB scores in case of the existing pairs of regular and generalizability results of the same algorithm for one dataset.

The performance of a particular algorithm for a given dataset and training data configuration is evaluated using the DET, SEG, and OP_CSB measures. The overall, measure-specific performance of the algorithm used for its ranking is then obtained by averaging its measure-specific performance scores over all the included datasets and training data configurations.

Apart from submitting a set of 78 segmentation-only results for the 13 included datasets, the participants must provide command-line versions of their algorithms used to produce the submitted results, thus allowing the challenge organizers to validate all submitted results by rerunning the algorithms on the test datasets on their own. Please note that the provided pieces of software will be used for validation purposes only. The submission instructions are the same as for regular submissions to the Cell Segmentation Benchmark, with the exception of output subfolder and entry file names that reflect the training data configuration used. For more details, please check the last section of this document.