Logo SFP

Data Challenge 2022-23 English

Data Challenge SFP 2022-23

The VisioMel project : search for a digital signature evaluating the risk of metastatic evolution of primary melanoma within 5 years following the initial diagnosis

Introduction

This project supervised by the French Society of Pathology along with the French Society of Dermatology, the Cutaneous Cancer Group  (GCC) and the National Professional Council of Pathologists (CNPath), aims to organize an international data challenge in May 2023 about melanoma relapse. This event, organized in collaboration with the Health Data Hub (HDH) and with the support of the Public Investment Bank (BPI), is a worldwide competition whose objective is to solve a specific problem in an allotted time and using strongly anonymized data. Thus, this challenge is intended for data scientists (researcher, industrials, students etc.) from all around the world. Challengers will have to build an artificial intelligence (AI) algorithm able to predict melanoma relapse within 5 years after initial diagnosis. In a final step, the accessibility of the data and algorithms resulting from the data challenge is encouraged in order to allow research in the interest of all.

Context

Melanoma is a cancer of the skin or, more rarely of the mucous membranes, which develops from melanocytes (cells responsible for skin pigmentation).
 
The causes of the disease are multifactorial but mainly depend on the interaction between UV exposure (period and intensity), host factors (presence of atypical nevi*, high number of nevi, skin phototype) and genetic factors.
 
In 2018, the National Cancer Institute estimates that 15,500 new cases of cutaneous melanoma were detected in France (7,900 men and 7,600 women). With 1,800 deaths that same year (1,040 men and 840 women), this cancer represents 1.2% of cancer deaths in France for all sexes combined. It is one of the cancers whose incidence* and mortality has significantly increased over the past decades.
 
These tumours represent around 10% of skin cancers but are the most serious because of their high metastatic potential. Development of metastases is a factor of poor prognosis*. This means that cancerous cells located in the primary tumor colonize neighboring healthy tissues leading to the formation of secondary tumors in the lymph nodes (called loco-regional melanoma) or in other organs (called distant melanoma). At the time of initial diagnosis, metastases are rarely observed. Indeed, they generally appear during the follow-up of the disease.
The diagnosis of melanoma is made by microscopic analysis of the tumor tissue by a pathologist. From a colored histological slide, the pathologist establishes the final diagnosis of the disease and determines the severity of the lesions according to prognostic factors* (size of the tumor, presence of ulceration, mitosis rate, etc.). These prognostic factors* are then synthesized into a stage associated with cancer according to the AJCC classification.
This analysis combined with clinical prognosis factors (age, sex, medical history of the patient etc.) allows the dermatologist to adapt the treatment to the severity of the disease.
 
Patient’s survival* essentially depends on the stage of their cancer at the time of diagnosis. Regarding primary cutaneous melanoma without metastasis, the prognosis is mainly related to the thickness of the melanoma. Thus, at an early stage (thin melanoma less than 1 mm thick) the 5-year survival is estimated at more than 95%. Thicker melanomas (over 4 mm) have a 50% risk of relapse within 5 years. If the melanoma is metastatic at the time of diagnosis or if it has relapsed, additional surgical (lymph node dissection, excision of metastases) or medical (immunotherapy, targeted therapy) treatments can then be proposed.

Questions explored through VisioMel project

1st Question : Although thin melanomas (less than 1 mm thick) are associated with a good prognosis, they are responsible for a significant and still misunderstood proportion of relapses and deaths. In the same way, for intermediate thickness melanomas (between 1 and 4 mm) with a higher risk of relapse, there are no predictive factors for this possible metastatic evolution. Adjuvant* treatments now exist to limit this risk for some operable melanomas assessed as high risk of relapse. However, beyond their high cost, these treatments also expose patients to significant drug toxicities. This is why it is becoming urgent to be able to distinguish patients who, without adjuvant treatment, do not relapse, in order to target only patients who can get clinical benefits from these therapies. These treatments could also, in the future, be considered in a neoadjuvant setting*.
 
The search for new predictive markers of relapse for primary non-metastatic melanomas by artificial intelligence would make it possible to complete the analysis of the pathologist and thus adapt the care of the patient. In addition, this identification for thin melanoma, whose recurrence is particularly complex to predict, would constitute a major step forward in melanomas’ care.
 
2nd Question : Alongside, the determination of the mutational status of the tumor (especially regarding the BRAF V600E gene) is essential for the prescription of a targeted therapy. The presence or absence of such a mutation also makes it possible to distinguish between different types of melanoma which may have distinct clinical evolutions. The search for such a mutation currently requires complementary techniques that can be costly. As a result, these techniques are currently only requested for advanced stage lesions.
 
The prediction of this status through an algorithmic analysis of the microscopical image would facilitate its determination whenever necessary.
 
AI approaches are particularly relevant for the creation of a tool at the service of doctors in order to support them in the rapid and precise detection of potential cases of relapse. Indeed, it is essential to identify new prognostic factors that the clinical examination or the histological examination might not perceive. Approaches using AI have already been used. However, they are based on quite small sample sizes and are using highly supervised methods. The size of the cohort considered here will allow the use of unsupervised methods. In addition, the coupling of clinical, histological and molecular variables would make it possible to go beyond the current segmentation of disciplines (clinical, histological and molecular) and to increase the chances of identifying prognostic "patterns".

Material and methods

General view of the project

Patients selection : As the problem is to predict a metastatic evolution of melanomas, only patients with localized disease at the time of diagnosis (stage 0 to IIb) are included in the study. Patients selection is made from the RIC-Mel database. Thanks to the efforts of a network of physicians from 49 French inclusion centers, this national database created in 2012 now collects data from around 40,000 patients with melanoma.
 
3,000 patients will be selected according to the following criteria:

  • Cancer stage between 0 and IIB,
  • Initial diagnosis between 2012 and 2016 (because the relapse is studied at 5 years following initial diagnosis) 

Selected variables for the challenge : The training step for the prediction of relapse at 5 years of non-metastatic primary melanomas by the AI algorithm would be done on the basis of clinical and histological data. The algorithm thus constructed will have to predict the recurrence of the tumor on the sole basis of the analysis of images, that is to say of histological data. Prediction of the B-RAF mutational status is a secondary goal that may necessitate a second data set, since only a subgroup of tumors was characterized molecularly.

In a first step and thanks to the effort of inclusion centers, the following clinical variables are updated and retrieved from the RIC-Mel database or from the patient medical reports:

  • age,
  • sex,
  • patient's medical history,
  • site of the melanoma (leg, arm, bust, face etc.),
  • primary tumor stage (AJCC),
  • family history,
  • molecular research of the BRAF mutation,
  • cancer progression/recurrence within 5 years (dates of events). 

In a second phase,the corresponding histological slides are de-archived and digitized after pseudonymization in order to be included in the challenge database. This task is done by the pathological anatomy and cytology (ACP) laboratories which carried out the analysis of the excision of the primary tumor.
Great care is taken to ensure the quality of the data by involving the inclusion centers and verifying the completeness of the data.

Anonymization : The data will be then strongly anonymized without possible return to the patient's name and stored on the Health Data Hub servers. A re-identification risk analysis will be made in collaboration with DrData.
 
Course of the challenge : The database will then be uploaded on the platform hosting the data challenge and will remain available for a 7 weeks period.
The data from the 3000 patients will be split into three different sets as follows:

  • 1200 samples will be available for the training set (slides + clinical data) (inclusion of molecular data still under discussion)
  • 600 samples will be available for the test set (only histological slides will be visible by the competitors)
  • 1200 samples for the validation set (only histological slides will be visible by the competitors)

Each set will be built in such a way as to overcome potential biases due to exogenous factors: ACP laboratory (preparation and staining of slides), type of scanner, etc. Similarly, the stage of the cancer and the sex of the patient will be distributed in a balanced way in the different sets.
The competition will take place on the last 1200 histological samples (validation set). The goal will be to quantify the performance of the algorithm in the prediction of melanoma relapse (and mutation status, still under discussion).
 
The performance of the algorithms proposed by the challengers will be evaluated on simple, binary criteria (absence or presence of metastatic evolution at 5 years). The error between the prediction and the “ground truth” will be weighted by the seriousness of that error using a metric. This mathematical weighting will have clinical meaning and will be communicated at the time of the competition.
 

Regulatory framework of the project

All stages of the project have been supervised and validated by DrData. This consulting structure specializes in the protection of personal data in the health field. Their teams of experts support hospitals, healthcare professionals and digital companies (artificial intelligence, telemedicine, etc.) in their GDPR compliance and the privacy by design of all their processes and projects.

Funding

the project is financially supported by the BPIFrance (BPI) as well as by donations from Bristol Myers Squibb (BMS) and Pierre Fabre.

Glossary

Source: Cancer Foundation

Excision : Surgical procedure consisting in removing from the body, and if possible in its entirety, an element that is harmful or useless to it.
 
Prognostic factor : Situation, state or characteristic of a person that is considered when establishing a prognosis. There are many different prognostic factors, including the type and stage of the cancer as well as the age and overall health of the person affected.
 
Incidence : Total number of new cases of a disease diagnosed in a given population during a specified period of time.
 
Nevus : Beauty spot/mole. It is a flat or raised spot that corresponds to a cluster of skin cells: melanocytes.
 
Prognosis : Expected outcome or course of a disease or chance of recovery or risk of recurrence.
 
Relapse : Cancer that comes back (recurs) after a period of time when the patient has had no signs or symptoms (remission). We speak of local recurrence when the cancer comes back in the same region of the body as the initial location (primary site) of the tumor. We speak of a distant recurrence when the cancer appears again in a region of the body other than the initial site (primary site) of the tumor.

Survival : The percentage of people with a disease who are still alive at some point after being diagnosed. Statistical data on cancer survival are often provided for a 5-year survival period. This data indicates the percentage of people with a particular disease who are still alive 5 years after being diagnosed. These may be people who do not have a recurrence, who are in remission or who are still receiving treatment.
 
Adjuvant therapy : treatment given in addition to first-line treatment (first treatment or standard treatment) to help reduce the risk of the disease coming back (recurring).
Neoadjuvant therapy: Neoadjuvant therapy is the administration of therapeutic agents before a main treatment.
 

Dernière mise à jour de la page: