Tourism Recommender Systems (TRS) are crucial in personalizing travel experiences by tailoring recommendations to users’ preferences, constraints, and contextual factors. However, publicly available travel datasets often lack sufficient breadth and depth, limiting their ability to support advanced personalization strategies—-particularly for sustainable travel and off-peak tourism. In this work, we explore using Large Language Models (LLMs) to generate synthetic travel queries that emulate diverse user personas and incorporate structured filters such as budget constraints and sustainability preferences.
This paper introduces a novel framework, SynthTRIPS, for generating synthetic travel queries using LLMs
grounded in a curated knowledge base (KB). Our approach combines persona-based preferences (e.g.,
budget, travel style) with explicit sustainability filters (e.g., walkability, air quality) to produce
realistic and diverse queries.
We mitigate hallucination and ensure factual correctness by grounding the LLM responses in the KB.
We formalize the query generation process and introduce evaluation metrics for assessing realism and
alignment. Both human expert evaluations and automatic LLM-based assessments demonstrate the
effectiveness of our synthetic dataset in capturing complex personalization aspects underrepresented in
existing datasets.
While our framework was developed and tested for personalized city trip recommendations, the methodology
applies to other recommender system domains.
Code and dataset are made public at:
https://bit
.ly/synthTRIPS
Figure 1 in the paper illustrates the proposed SynthTRIPS framework for generating synthetic travel queries using LLMs. The diagram highlights three key components:
The figure visually represents how these components interact, showing a flow from persona selection and filter application to structured prompting and final query generation. This framework enables the automated creation of diverse, sustainability-aware travel queries, which can be used to benchmark personalized tourism recommender systems.
[Fig. 2a] Distribution of the cities in our KB, grouped by popularity levels.
[Fig. 2b] Radar Chart showing the different dimensions of validation and performance of queries generated by the two models. L (E) denotes LLM (Expert) validations.
(a) Gemini
(b) Llama
Our tool used for Expert Evaluation can be found here: Expert Evaluation Tool
When prompted for a Validation code, please use SynthTRIPS2025
We thank the Google AI/ML Developer Programs team for supporting us with Google Cloud Credits.
@misc{banerjee2025synthTRIPS,
title={SynthTRIPS: A Knowledge-Grounded Framework for Benchmark Query Generation for Personalized Tourism Recommenders},
author={Ashmi Banerjee and Adithi Satish and Fitri Nur Aisyah and
Wolfgang Wörndl and Yashar Deldjoo},
year={2025},
eprint={},
archivePrefix={arXiv},
primaryClass={cs.CV}
}