Challenges in Technical Regulatory Text Variation Detection

Jan 1, 2025·
Shriya Vaagdevi Chikati
,
Samuel Larkin
,
David Minicola
,
Chi-Kiu Lo
· 0 min read
Abstract
We present a preliminary study on the feasibility of using current natural language processing techniques to detect variations between the construction codes of different jurisdictions. We formulate the task as a sentence alignment problem and evaluate various sentence representation models for their performance in this task. Our results show that task-specific trained embeddings perform marginally better than other models, but the overall accuracy remains a challenge. We also show that domain-specific fine-tuning hurts the task performance. The results highlight the challenges of developing NLP applications for technical regulatory texts.
Type
Publication
Proceedings of the 1st Regulatory NLP Workshop (RegNLP 2025)