Optimization of Frequent Pattern Mining for Tourist Behavior Analysis

When:
28/02/2025 all-day
2025-02-28T01:00:00+01:00
2025-02-28T01:00:00+01:00

Offre en lien avec l’Action/le Réseau : DOING/– — –

Laboratoire/Entreprise : DVRC@ESILV
Durée : 6 mois
Contact : nicolas.travers@devinci.fr
Date limite de publication : 2025-02-28

Contexte :
Understanding the appreciation of visits made by tourists is a major issue in the tourism sector to anticipate trend evolutions as well as how they move across the territory. One approach to estimating this appreciation is based on the extraction of frequent patterns on a circulation graph, such as Graphlet extraction [1], k-decomposition [2], or cohesive structures like k-plexes [6]. Thus, tourism trends are extracted using their frequency of occurrence in a topological manner.
However, tourism data from experience-recommending platforms such as TripAdvisor or Google Maps results in large data graphs that become challenging to process with traditional data mining techniques. With a large number of places visited (millions) and an enormous number of user comments (billions), it is necessary to develop a new approach for scaling graph-based algorithms.

Sujet :
To this end, within the STARCS axis of DVRC, we have developed an exhaustive and scalable pattern extraction approach on a graph using Pregel [3]. This approach allows us to extract both the pattern topology and node properties, including geodesic information [4, 5, 7]. The extraction has been extended to complex patterns giving interesting perspectives of enhancement. We now wish to take this approach a step further by focusing on optimizing the mining process.
The internship has two main goals:
• Use a topological signature technique to mine patterns in a Neo4j database (in Pregel/Java).
• Improve the method to provide a heuristic adapted to the geodesic context.
Example of aggregated tourist propagation graph across the French territory:
• How can we identify significant propagation patterns?
• What are the characteristics of a pattern?
• Can we extract seasonality from different
groups of patterns?

Profil du candidat :

M2 level students (Master or Engineering Schools).

Formation et compétences requises :
Databases, Data Mining, graph DB (Neo4j, Cypher), Java, parallelism.

Adresse d’emploi :
De Vinci Research Center at ESILV at (École Supérieure d’Ingénieurs Léonard de Vinci ; Paris, la Défense).

Document attaché : 202411221055_2425_TRAVERS_GraphMining.pdf