LUPAN, the First French Corpus for Land Artificialization Management
In a world where land artificialization poses threats to ecological balance and urban security, our project, Hérelles, emerges as an innovative solution. We have developed a groundbreaking framework aimed at improving the management of this alarming phenomenon. By integrating the analysis of satellite image time series with urban planning documents, we have created a unique corpus named LUPAN (“Local Urban Plans And Natural risks”) to automatically extract urban planning rules in specific areas.
To achieve this, we manually annotated a selection of documents related to the Montpellier Mediterranean Metropolis (3M), a region experiencing rapid development and exposed to natural risks. We have established a standardized format for labeled examples, including titles and subtitles, and proposed a hierarchical representation of class labels to enhance the versatility of our corpus.
This corpus, the first of its kind in the French language, comprises 1934 textual segments annotated across four distinct classes: “Verifiable”, “Non-verifiable”, “Informative”, and “Not relevant”. With LUPAN, we lay the foundation for the development of a machine learning model to classify regulatory texts and identify urban planning rules.
Such a model can determine whether input text contains a rule indicating the possibility of verification with satellite images ("Verifiable" class), the impossibility of such verification ("Non-verifiable" class), non-strict rules in the form of recommendations (“Informative” class), or none of the above (“Not relevant” class). This significant achievement opens up the possibility of better preventing the natural risks associated with rapid urbanization, pushing our society towards smarter and more sustainable urban management practices.