Ensemble constrained clustering for time series analysis, with application to industry data

When:

11/10/2024 – 12/10/2024 all-day

2024-10-11T02:00:00+02:00

2024-10-12T02:00:00+02:00

Jobs

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : ICube laboratory
Durée : 2 years
Contact : lafabregue@unistra.fr
Date limite de publication : 2024-10-11

Contexte :
Automated data acquisition systems and increasing storage capacities have made time series data available across a wide
range of domains, from earth observation to industry. However, this data is often provided with insufficient or no labels, thus preventing the use of supervised methods. In this context, unsupervised methods can be valuable to help users extract information, such as identifying different behaviors on a production line. Nevertheless, when it comes to time series analysis, these methods face several drawbacks.

First, the diverse nature of sensors and sources used to generate
temporal data results in significant heterogeneity in terms of
format, volume, quality, and richness of information. For example, a single production line can include a large set of different sensors, each constrained by its manufacturer’s API. This diversity has led to a wide range of categorization methods for analyzing time series, e.g., based on elastic metrics, frequency decomposition, and pattern extraction , each with its own advantages and limitations, which can also complement one
another.

Secondly, clustering approaches often yield results that do not align with the experts’ expectations or intuitions. This is especially true when considering the aforementioned heterogeneity of time series data. Therefore, incorporating some expert knowledge, even if it doesn’t encompass the full spectrum of actual classes, can significantly enhance the quality of the clustering results. This knowledge is often expressed in the form of constraints. However, these methods often suffer from the negative impact of constraints, resulting in a decrease in quality when constraints are added.

Finally, asking experts to define all classes at the outset of the project is unreasonable. It is indeed often the case that not all classes can be semantically defined before a data analysis has been carried out. It is more practical to engage experts throughout the entire process as they progressively unfold the data processing and analysis within an iterative cycle of interactions between the expert and the learning system. The goal of this interaction is to bridge the gap between the results generated by the algorithms and the expert’s thematic insights. This process is designed to make the results more comprehensible to the expert.

Sujet :
The main task of this post-doc is to develop an ensemble clustering method that relies on a diversity of viewpoints (i.e. representations or metrics). It will use constraints given iteratively by the user to select and combine the proper viewpoints. This should result in a better clustering that is a consensus of the most suitable viewpoints, in adequacy with the expert’s knowledge, to leverage potential negative effects of constraints. To achieve this goal, we need to fulfill four objectives:
● Select a subset of sufficiently independent/diverse existing metrics/representations (required to have complementary viewpoints) relevant to clusterize time series;
● Define a generic ensemble method to obtain a consensus clustering result from the previously selected viewpoints that maximize the respect of the expert’s knowledge;
● Propose a generic method to iteratively update the clustering by integrating new expert’s knowledge in interaction with the expert;
● Validate the method operability by focusing on Industry data, mainly relying on a demonstration production line of one of our industry partners

The person recruited will be co-directed by Nicolas Lachiche (50%), specialist of complex data mining, and Baptiste Lafabrègue (50%), time series analysis specialist. He or she will actively collaborate with the SDC team at ICube in Strasbourg, and more particularly with Nassime Mountasir, a 3rd-year PhD student working on predictive maintenance issues

Profil du candidat :
● PhDinComputerScience, specializing in machine learning/explainability.
● Solid knowledge of Machine Learning methods. Experience in time series analysis and/or predictive maintenance would be also valuable.
● Goodverbal (English or French) and written (English) communication skills.
● Interpersonal skills and the ability to work individually or as part of a project team.

Formation et compétences requises :

Adresse d’emploi :
Illkirch, south of Strasbourg (Pôle API, 300 Bd Sébastien Brant, 67400 Illkirch-Graffenstaden)

Document attaché : 202407291327_EnsembleTimeSeries_offer.docx.pdf

MaDICS

Masses de Données, Informations et Connaissances en Sciences

Big Data - Data Science

Ensemble constrained clustering for time series analysis, with application to industry data