Annonce en lien avec l’Action/le Réseau : aucun
Laboratoire/Entreprise : IRISA
Durée : 3 ans
Contact : zoltan.miklos@irisa.fr
Date limite de publication : 2017-12-31
Contexte :
Understanding the evolution of various scientific fields is important for our society. Obtaining a general picture of important evolutions of entire scientific fields is rather challenging in the light of the proliferation of scientific publishing and in the presence of overspecialized scientific journals. Recent papers [1,2] propose text analysis techniques to reconstruct important aspects of evolution, based on large corpora of scientific publications (such as Web of Science, PubMed).
The Epique project proposes to develop automated tools that can assist (social) scientists to study empirically particular aspects of the social dynamics of science. The existing methods for phylomemetic structure reconstruction rely on the following schema. 1) Extraction of key terms from the articles. 2) Construction of a term co-occurrence graph (in the scientific publications), 3) identifying densely connected subgraphs in this term co-occurrence graph and 4) inter-temporal analysis of dense subgraphs. The result of the analysis is represented in the form of phylomemetic lattices (which are analogous to phylogenetic trees that are used in biology, for representing the evolution natural species). While automatic phylomemetic structure reconstruction gives promising results, the scientist studying the evolution of science would like to interact with the tools and influence the construction algorithms.
Sujet :
The thesis should develop techniques that can enable the interactive construction of phylomemetic structures. Through the interaction the scientists can add or precise pieces of information in order to reduce the uncertainties present at the various stages of the reconstruction procedure.
The thesis will focus on some of the following aspects.
• Developing a model of phylomemetic structure as a (structured) knowledge extraction
• Enriching the extraction model with quality metrics
• We would like to develop algorithms that can support scientists exploring the graph (lattice). This requires data exploration techniques [8,9], as the phylomemetic structure is rather large in practice.
• Provenance. As provenance questions can be important in the reconstruction process, our model should also deal with provenance information [10].
• Developing a workflow model of phylomemetic structure maintenance that can update parts of the network, in particular in the case of quality problems.
Profil du candidat :
The PhD candidate should have the following competences.
• Fluent in English (written, spoken)
• Good knowledge of data mining and knowledge extraction techniques
• Algorithmic and programming skills
• Ideally, experience with large-scale data management techniques
Foreign applications are welcome. French language skills are useful, but not mandatory.
Formation et compétences requises :
The PhD candidate should hold a master or equivalent degree in computer science.
Adresse d’emploi :
IRISA / INRIA
263 Avenue Général Leclerc, 35000 Rennes
France
Document attaché : sujetEpique_v2.pdf