Development of automated segmentation and clustering methods for spICP-ToF-MS time-series in Nanogeochemistry.

When:
12/01/2024 all-day
2024-01-12T01:00:00+01:00
2024-01-12T01:00:00+01:00

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Institut de physique du globe de Paris, Université
Durée : 6 mois
Contact : tharaud@ipgp.fr
Date limite de publication : 2024-01-12

Contexte :
Nanoparticles (NPs) are pervasive in natural systems, playing a crucial role in nanogeochemistry. The emergence of single-particle time-of-flight inductively coupled plasma mass spectrometry (spICP-ToF-MS) has revolutionized NP characterization, presenting new challenges in data analysis.

Sujet :
This research project seeks to bridge advanced nano-instrumentation with data-driven insights, focusing on the development of standardized methodologies for integrating spICP-ToF-MS with state-of-the-art machine learning algorithms. The IPGP hosts a world-leading geochemistry platform (PARI) equipped with an operational spICP-ToF-MS instrument and possesses an extensive dataset to (1) develop a novel methodology for the automated segmentation and clustering of NP time series generated by spICP-ToF-MS, (2) address challenges including instrumental noise, unknown NP compositions, and large data volumes requiring sophisticated statistical methods, and (3) explore interdisciplinary collaboration between geochemists, data scientists, and analytical chemists.

Preliminary tests have shown encouraging results using a 4-step methodology described and illustrated below:
– Detection: Establish a conservative threshold for detecting significant NP signals within time series data using intensity distribution across channels (b).
– Clustering: Identify families of NP signals through unsupervised clustering algorithms, considering the unknown number of NP families in natural environments.
– Classification: Train a classifier to differentiate various NPs within continuous time series, including an additional noise class, using realistic data.
– Segmentation: Divide time series into segments based on the classifier, addressing the challenge of determining optimal segmentation window size (c).

Profil du candidat :
We seek candidate with a strong background in data science, including machine learning or deep learning tehcniques.

Formation et compétences requises :
Master 2 level or engineering school.

Adresse d’emploi :
IPGP, 1, Rue Jussieu, 75005 Paris

Document attaché : 202311231031_M2 Internship IPGP-LIPADE.pdf