Offre en lien avec l’Action/le Réseau : – — –/– — –
Laboratoire/Entreprise : Laboratoire Interdisciplinaire des Sciences du Num
Durée : 5 ou 6 mois
Contact : guinaudeau@limsi.fr
Date limite de publication : 2025-02-28
Contexte :
Most human interactions occur through spoken conversations. If this interaction mode seems so natural and easy for humans, it remains a challenge for spoken language processing models as conversational speech raises critical issues. First, non-verbal information can be essential to understand a message. For example a smiling face and a joyful voice can help detecting irony or humor in a message. Second, visual grounding between participants is often needed during a conversation to integrate posture and body gesture as well as references to the surrounding world. For example, a speaker can talk about an object on a table and refer to it as this object by designing it with her hand. Finally, semantic grounding between participants of a conversation to establish mutual knowledge is essential for communicating with each other.
Sujet :
In this context, the MINERAL project aims to train a multimodal conversation representation model for communicative acts and to study communicative structures of audiovisual conversation.
As part of this project, we are offering a 5- to 6-month internship focused on semi-automatic annotation of conversations in audio-visual documents. The intern’s first task will be to extend the existing annotation ontology for dialog acts, currently available for audio documents (through the Switchboard corpus for example), to incorporate the visual modality. In a second step, the intern will develop an automatic process for transferring annotations to new audiovisual datasets (such as meeting videos and TV series or movies) using transfer or few-shot learning approaches.
Practicalities:
The internship will be funded ~500 euros per month for a duration of 5 or 6 months and will take place at LISN within the LIPS team. This internship can potentially be followed by a funded PhD, based on performance and interest in continuing research in this area.
To apply, please send your CV, a cover letter and your M1 and M2 transcripts (if available) by email to Camille Guinaudeau camille.guinaudeau@universite-paris-saclay.fr and Sahar Ghannay sahar.ghannay@universite-paris-saclay.fr
Profil du candidat :
Formation et compétences requises :
Required Qualifications:
● Master’s degree (M2) in Computer Science or related field.
● Experience with deep learning frameworks such as Keras or PyTorch.
● Knowledge of image processing would be an advantage.
Adresse d’emploi :
LISN – Équipe LIPS
Campus Universitaire bâtiment 507
Rue du Belvédère
91400 Orsay
Document attaché : 202411111659_Stage_MINERAL.pdf