Qdonnars Cursor commited on
Commit
bad6218
·
1 Parent(s): b90d5d5

feat: Implement MCP Server for Indicateurs Territoriaux API

Browse files

Add complete MCP server exposing 4 tools for querying French territorial
ecological indicators via the CGDD/Ministry Cube.js API.

Tools implemented:
- list_indicators: List all indicators with thematique/maille filters
- get_indicator_details: Get full metadata and sources for an indicator
- query_indicator_data: Query data values by geographic level and code
- search_indicators: Full-text search in indicator names/descriptions

Architecture:
- Gradio SSE endpoint at /gradio_api/mcp/ for Claude.ai integration
- CubeResolver for mapping indicator_id to data cubes via /meta parsing
- Metadata cache with periodic refresh
- Async httpx client with proper error handling

Cube naming convention discovered:
- Data cubes: {thematique}_{maille} (e.g., conso_enaf_com)
- Measures: {cube}.id_{indicator_id} (e.g., conso_enaf_com.id_611)
- Geo dimensions: geocode_*/libelle_* for all levels

Co-authored-by: Cursor <cursoragent@cursor.com>

Files changed (11) hide show
  1. .env.example +9 -0
  2. .gitignore +21 -0
  3. README.md +290 -2
  4. app.py +186 -0
  5. requirements.txt +11 -0
  6. src/__init__.py +3 -0
  7. src/api_client.py +317 -0
  8. src/cache.py +299 -0
  9. src/cube_resolver.py +286 -0
  10. src/models.py +239 -0
  11. src/tools.py +354 -0
.env.example ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ # API Token for the Indicateurs Territoriaux API
2
+ # Get your token from the API provider
3
+ INDICATEURS_TE_TOKEN=your_jwt_token_here
4
+
5
+ # Base URL of the API (default: production)
6
+ INDICATEURS_TE_BASE_URL=https://api.indicateurs.ecologie.gouv.fr
7
+
8
+ # Cache refresh interval in seconds (default: 1 hour)
9
+ CACHE_REFRESH_SECONDS=3600
.gitignore ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Environment
2
+ .env
3
+ venv/
4
+ __pycache__/
5
+ *.pyc
6
+
7
+ # IDE
8
+ .vscode/
9
+ .idea/
10
+
11
+ # Logs
12
+ *.log
13
+ server.log
14
+
15
+ # Test/debug files
16
+ cubes_list.json
17
+ cubes_structure.json
18
+
19
+ # OS
20
+ .DS_Store
21
+ Thumbs.db
README.md CHANGED
@@ -4,9 +4,297 @@ emoji: 📉
4
  colorFrom: blue
5
  colorTo: pink
6
  sdk: gradio
7
- sdk_version: 6.5.1
8
  app_file: app.py
9
  pinned: false
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  colorFrom: blue
5
  colorTo: pink
6
  sdk: gradio
7
+ sdk_version: 5.0.0
8
  app_file: app.py
9
  pinned: false
10
  ---
11
 
12
+ # MCP Server - Indicateurs Territoriaux de Transition Écologique
13
+
14
+ Serveur MCP (Model Context Protocol) exposant l'API du **Hub d'Indicateurs Territoriaux de Transition Écologique** (CGDD/Ministère de la Transition Écologique).
15
+
16
+ Ce serveur permet à des LLMs (Claude, GPT, etc.) d'interroger les données environnementales territoriales françaises.
17
+
18
+ ## Outils MCP disponibles
19
+
20
+ ### 1. `list_indicators`
21
+ Liste tous les indicateurs avec filtres optionnels.
22
+
23
+ **Paramètres :**
24
+ - `thematique` (optionnel) : Filtre par thématique FNV ("mieux se déplacer", "mieux se loger"...)
25
+ - `maille` (optionnel) : Filtre par niveau géographique ("region", "departement", "epci", "commune")
26
+
27
+ ### 2. `get_indicator_details`
28
+ Retourne les détails complets d'un indicateur (description, méthode de calcul, sources).
29
+
30
+ **Paramètres :**
31
+ - `indicator_id` : ID numérique de l'indicateur
32
+
33
+ ### 3. `query_indicator_data`
34
+ Interroge les données d'un indicateur pour un territoire.
35
+
36
+ **Paramètres :**
37
+ - `indicator_id` : ID de l'indicateur
38
+ - `geographic_level` : "region" | "departement" | "epci" | "commune"
39
+ - `geographic_code` (optionnel) : Code INSEE du territoire
40
+ - `year` (optionnel) : Année des données
41
+
42
+ ### 4. `search_indicators`
43
+ Recherche d'indicateurs par mots-clés.
44
+
45
+ **Paramètres :**
46
+ - `query` : Termes de recherche (recherche dans libellé et description)
47
+
48
+ ## Installation
49
+
50
+ ### Prérequis
51
+
52
+ - Python >= 3.10
53
+ - Un token d'authentification pour l'API Indicateurs
54
+
55
+ ### Installation locale
56
+
57
+ ```bash
58
+ # Cloner le dépôt
59
+ git clone https://github.com/your-repo/mcp-indicateurs-te.git
60
+ cd mcp-indicateurs-te
61
+
62
+ # Créer un environnement virtuel
63
+ python -m venv venv
64
+ source venv/bin/activate # Linux/Mac
65
+ # ou: venv\Scripts\activate # Windows
66
+
67
+ # Installer les dépendances
68
+ pip install -r requirements.txt
69
+
70
+ # Configurer le token
71
+ cp .env.example .env
72
+ # Éditer .env et ajouter votre token INDICATEURS_TE_TOKEN
73
+
74
+ # Lancer le serveur
75
+ python app.py
76
+ ```
77
+
78
+ Le serveur sera accessible sur `http://localhost:7860`.
79
+
80
+ ### Déploiement HuggingFace Spaces
81
+
82
+ 1. Créer un nouveau Space avec le SDK Gradio
83
+ 2. Pousser le code vers le Space
84
+ 3. Configurer le secret `INDICATEURS_TE_TOKEN` dans les paramètres du Space
85
+
86
+ ## Configuration MCP Client
87
+
88
+ ### Claude Desktop
89
+
90
+ Ajouter dans `claude_desktop_config.json` :
91
+
92
+ ```json
93
+ {
94
+ "mcpServers": {
95
+ "indicateurs-te": {
96
+ "url": "https://YOUR-SPACE.hf.space/gradio_api/mcp/"
97
+ }
98
+ }
99
+ }
100
+ ```
101
+
102
+ ### Cursor
103
+
104
+ Ajouter dans les paramètres MCP de Cursor :
105
+
106
+ ```json
107
+ {
108
+ "mcpServers": {
109
+ "indicateurs-te": {
110
+ "url": "http://localhost:7860/gradio_api/mcp/"
111
+ }
112
+ }
113
+ }
114
+ ```
115
+
116
+ ### Avec mcp-remote (pour clients ne supportant pas HTTP)
117
+
118
+ ```json
119
+ {
120
+ "mcpServers": {
121
+ "indicateurs-te": {
122
+ "command": "npx",
123
+ "args": [
124
+ "mcp-remote",
125
+ "http://localhost:7860/gradio_api/mcp/"
126
+ ]
127
+ }
128
+ }
129
+ }
130
+ ```
131
+
132
+ ## Architecture de l'API (validée par tests)
133
+
134
+ ### Convention de nommage des cubes
135
+
136
+ Les cubes de **données** suivent le format : `{thematique}_{maille}`
137
+
138
+ | Suffixe | Maille |
139
+ |---------|--------|
140
+ | `_com` | Commune |
141
+ | `_epci` | EPCI |
142
+ | `_dpt` | Département |
143
+ | `_reg` | Région |
144
+
145
+ Exemples :
146
+ - `conso_enaf_com` → Consommation ENAF, maille commune
147
+ - `surface_bio_dpt` → Surface bio, maille département
148
+
149
+ ### Les measures contiennent l'ID de l'indicateur
150
+
151
+ Format : `{cube_name}.id_{indicator_id}`
152
+
153
+ Exemples :
154
+ - `conso_enaf_com.id_611` → Indicateur 611 dans le cube conso_enaf_com
155
+ - `surface_bio_dpt.id_606` → Indicateur 606 dans le cube surface_bio_dpt
156
+
157
+ ### Dimensions géographiques (standardisées)
158
+
159
+ | Dimension | Description |
160
+ |-----------|-------------|
161
+ | `geocode_commune` | Code INSEE commune (5 chiffres) |
162
+ | `libelle_commune` | Nom de la commune |
163
+ | `geocode_epci` | Code SIREN EPCI (9 chiffres) |
164
+ | `libelle_epci` | Nom de l'EPCI |
165
+ | `geocode_departement` | Code département (2-3 car.) |
166
+ | `libelle_departement` | Nom du département |
167
+ | `geocode_region` | Code région (2 chiffres) |
168
+ | `libelle_region` | Nom de la région |
169
+
170
+ ### Dimensions temporelles
171
+
172
+ | Dimension | Description |
173
+ |-----------|-------------|
174
+ | `annee` | Année (string : "2020") |
175
+
176
+ ## Exemples d'utilisation
177
+
178
+ ### Via un LLM
179
+
180
+ ```
181
+ Utilisateur: Quels indicateurs sur la consommation d'espace ?
182
+
183
+ LLM: [appelle search_indicators("consommation espace")]
184
+ Voici les indicateurs disponibles :
185
+ - ID 611: Consommation d'espaces naturels, agricoles et forestiers
186
+
187
+ Utilisateur: Détails sur l'indicateur 611
188
+
189
+ LLM: [appelle get_indicator_details("611")]
190
+ L'indicateur 611 mesure la consommation d'ENAF...
191
+ Mailles disponibles: commune, epci, departement, region
192
+
193
+ Utilisateur: Valeurs pour la région PACA en 2020
194
+
195
+ LLM: [appelle query_indicator_data("611", "region", "93", "2020")]
196
+ Pour PACA (code 93) en 2020 : 1737.29 ha
197
+ ```
198
+
199
+ ### Exemple de requête Cube.js validée
200
+
201
+ ```json
202
+ {
203
+ "query": {
204
+ "measures": ["conso_enaf_com.id_611"],
205
+ "dimensions": [
206
+ "conso_enaf_com.libelle_region",
207
+ "conso_enaf_com.annee"
208
+ ],
209
+ "filters": [
210
+ {
211
+ "member": "conso_enaf_com.geocode_region",
212
+ "operator": "equals",
213
+ "values": ["93"]
214
+ }
215
+ ],
216
+ "limit": 100
217
+ }
218
+ }
219
+ ```
220
+
221
+ ## Codes géographiques INSEE
222
+
223
+ | Niveau | Format | Exemples |
224
+ |--------|--------|----------|
225
+ | Région | 2 chiffres | 93 (PACA), 11 (Île-de-France), 75 (Nouvelle-Aquitaine), 84 (Auvergne-Rhône-Alpes) |
226
+ | Département | 2-3 caractères | 13, 2A, 974 |
227
+ | EPCI | 9 chiffres (SIREN) | 200054807 |
228
+ | Commune | 5 chiffres | 75056 (Paris), 13055 (Marseille) |
229
+
230
+ ## Variables d'environnement
231
+
232
+ | Variable | Description | Défaut |
233
+ |----------|-------------|--------|
234
+ | `INDICATEURS_TE_TOKEN` | Token JWT pour l'API | (requis) |
235
+ | `INDICATEURS_TE_BASE_URL` | URL de base de l'API | `https://api.indicateurs.ecologie.gouv.fr` |
236
+ | `CACHE_REFRESH_SECONDS` | Intervalle de rafraîchissement du cache | 3600 |
237
+
238
+ ## Architecture technique
239
+
240
+ ```
241
+ ┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
242
+ │ MCP Client │────▶│ Gradio App │────▶│ Cube.js API │
243
+ │ (Claude, etc.) │ │ (MCP Server) │ │ (Indicateurs) │
244
+ └─────────────────┘ └──────────────────┘ └─────────────────┘
245
+
246
+ ┌──────┴──────┐
247
+ │ CubeResolver │
248
+ │ + Cache │
249
+ └─────────────┘
250
+ ```
251
+
252
+ - **Gradio** : Interface web + endpoint SSE MCP
253
+ - **CubeResolver** : Mapping indicator_id → cube_name via parsing /meta
254
+ - **Cache** : Métadonnées des indicateurs chargées au démarrage
255
+
256
+ ## Structure du projet
257
+
258
+ ```
259
+ ├── src/
260
+ │ ├── __init__.py # Package init
261
+ │ ├── api_client.py # Client HTTP Cube.js async
262
+ │ ├── cube_resolver.py # Logique find_cube_for_indicator
263
+ │ ├── cache.py # Cache des métadonnées
264
+ │ ├── models.py # Modèles Pydantic
265
+ │ └── tools.py # Implémentation des 4 outils MCP
266
+ ├── app.py # Point d'entrée Gradio
267
+ ├── requirements.txt # Dépendances
268
+ └── .env.example # Template de configuration
269
+ ```
270
+
271
+ ## Développement
272
+
273
+ ### Test avec MCP Inspector
274
+
275
+ ```bash
276
+ # Lancer le serveur
277
+ python app.py
278
+
279
+ # Dans un autre terminal
280
+ npx @modelcontextprotocol/inspector
281
+ # Connecter à http://localhost:7860/gradio_api/mcp/
282
+ ```
283
+
284
+ ### Points d'attention
285
+
286
+ 1. **Cache du /meta** : ~100+ cubes, chargé une fois au startup
287
+ 2. **Mapping indicator_id → cube** : Parcours des measures de chaque cube
288
+ 3. **Mailles non uniformes** : Vérifier `mailles_disponibles` avant de requêter
289
+ 4. **Valeurs string** : Les filtres Cube.js attendent des strings (`"93"` pas `93`)
290
+
291
+ ## Ressources
292
+
293
+ - [Documentation MCP](https://modelcontextprotocol.io/)
294
+ - [Gradio MCP Guide](https://gradio.app/guides/building-mcp-server-with-gradio)
295
+ - [API Cube.js](https://cube.dev/docs/rest-api)
296
+ - [Portail Indicateurs](https://ecologie.data.gouv.fr/indicators)
297
+
298
+ ## Licence
299
+
300
+ MIT
app.py ADDED
@@ -0,0 +1,186 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Gradio MCP Server for Indicateurs Territoriaux de Transition Écologique.
2
+
3
+ This application exposes 4 MCP tools for querying French territorial
4
+ ecological indicators via the Cube.js API.
5
+
6
+ Tools:
7
+ - list_indicators: List all indicators with optional filters
8
+ - get_indicator_details: Get detailed info about a specific indicator
9
+ - query_indicator_data: Query data values for a territory
10
+ - search_indicators: Search indicators by keywords
11
+
12
+ Usage:
13
+ Run locally:
14
+ python app.py
15
+
16
+ Deploy on HuggingFace Spaces:
17
+ Push to a Space with Gradio SDK configured.
18
+
19
+ Connect as MCP Server:
20
+ URL: http://your-server:7860/gradio_api/mcp/
21
+ """
22
+
23
+ import os
24
+ import gradio as gr
25
+ from dotenv import load_dotenv
26
+
27
+ # Load environment variables
28
+ load_dotenv()
29
+
30
+ # Import tools
31
+ from src.tools import (
32
+ list_indicators,
33
+ get_indicator_details,
34
+ query_indicator_data,
35
+ search_indicators,
36
+ )
37
+ from src.models import GEOGRAPHIC_LEVELS
38
+
39
+ # Check if token is configured
40
+ if not os.getenv("INDICATEURS_TE_TOKEN"):
41
+ print("WARNING: INDICATEURS_TE_TOKEN not set. API calls will fail.")
42
+ print("Set the token in .env file or as environment variable.")
43
+
44
+
45
+ # Create individual interfaces for each tool
46
+ list_interface = gr.Interface(
47
+ fn=list_indicators,
48
+ inputs=[
49
+ gr.Textbox(
50
+ label="Thématique FNV",
51
+ placeholder="Ex: mieux se déplacer, mieux se loger...",
52
+ info="Filtre par thématique France Nation Verte (recherche partielle)",
53
+ ),
54
+ gr.Dropdown(
55
+ choices=[""] + GEOGRAPHIC_LEVELS,
56
+ label="Maille géographique",
57
+ info="Filtre par niveau géographique disponible",
58
+ ),
59
+ ],
60
+ outputs=gr.JSON(label="Indicateurs"),
61
+ title="Lister les indicateurs",
62
+ description="Liste tous les indicateurs disponibles avec filtres optionnels.",
63
+ api_name="list_indicators",
64
+ )
65
+
66
+ details_interface = gr.Interface(
67
+ fn=get_indicator_details,
68
+ inputs=[
69
+ gr.Textbox(
70
+ label="ID de l'indicateur",
71
+ placeholder="Ex: 611",
72
+ info="Identifiant numérique de l'indicateur",
73
+ ),
74
+ ],
75
+ outputs=gr.JSON(label="Détails"),
76
+ title="Détails d'un indicateur",
77
+ description="Retourne les métadonnées complètes et les sources d'un indicateur.",
78
+ api_name="get_indicator_details",
79
+ )
80
+
81
+ query_interface = gr.Interface(
82
+ fn=query_indicator_data,
83
+ inputs=[
84
+ gr.Textbox(
85
+ label="ID de l'indicateur",
86
+ placeholder="Ex: 611",
87
+ info="Identifiant numérique de l'indicateur",
88
+ ),
89
+ gr.Dropdown(
90
+ choices=GEOGRAPHIC_LEVELS,
91
+ label="Niveau géographique",
92
+ value="region",
93
+ info="Maille territoriale à interroger",
94
+ ),
95
+ gr.Textbox(
96
+ label="Code INSEE",
97
+ placeholder="Ex: 93 (PACA), 13 (Bouches-du-Rhône)...",
98
+ info="Code du territoire (optionnel)",
99
+ ),
100
+ gr.Textbox(
101
+ label="Année",
102
+ placeholder="Ex: 2020",
103
+ info="Année des données (optionnel)",
104
+ ),
105
+ ],
106
+ outputs=gr.JSON(label="Données"),
107
+ title="Interroger les données",
108
+ description="Récupère les valeurs d'un indicateur pour un territoire donné.",
109
+ api_name="query_indicator_data",
110
+ )
111
+
112
+ search_interface = gr.Interface(
113
+ fn=search_indicators,
114
+ inputs=[
115
+ gr.Textbox(
116
+ label="Recherche",
117
+ placeholder="Ex: consommation espace, surface bio, émissions CO2...",
118
+ info="Mots-clés à rechercher dans le nom et la description",
119
+ ),
120
+ ],
121
+ outputs=gr.JSON(label="Résultats"),
122
+ title="Rechercher des indicateurs",
123
+ description="Recherche des indicateurs par mots-clés.",
124
+ api_name="search_indicators",
125
+ )
126
+
127
+ # Combine all interfaces into a tabbed interface
128
+ demo = gr.TabbedInterface(
129
+ interface_list=[
130
+ list_interface,
131
+ search_interface,
132
+ details_interface,
133
+ query_interface,
134
+ ],
135
+ tab_names=[
136
+ "Lister",
137
+ "Rechercher",
138
+ "Détails",
139
+ "Données",
140
+ ],
141
+ title="MCP Server - Indicateurs Territoriaux de Transition Écologique",
142
+ )
143
+
144
+ # Add a description block
145
+ with demo:
146
+ gr.Markdown(
147
+ """
148
+ ---
149
+ ### Connexion MCP
150
+
151
+ Pour utiliser ce serveur comme outil MCP dans Claude Desktop, Cursor ou autre client MCP :
152
+
153
+ ```json
154
+ {
155
+ "mcpServers": {
156
+ "indicateurs-te": {
157
+ "url": "https://YOUR-SPACE.hf.space/gradio_api/mcp/"
158
+ }
159
+ }
160
+ }
161
+ ```
162
+
163
+ ### Structure des données
164
+
165
+ Les cubes de données suivent le format `{thematique}_{maille}` :
166
+ - `conso_enaf_com` → Consommation ENAF, maille commune
167
+ - `surface_bio_dpt` → Surface bio, maille département
168
+
169
+ Les measures contiennent l'ID de l'indicateur : `{cube}.id_{indicator_id}`
170
+
171
+ ### API Cube.js
172
+
173
+ Ce serveur interroge l'API du Hub d'Indicateurs Territoriaux du Ministère de la Transition Écologique.
174
+
175
+ - Documentation : [ecologie.data.gouv.fr/indicators](https://ecologie.data.gouv.fr/indicators)
176
+ - API : `https://api.indicateurs.ecologie.gouv.fr`
177
+ """
178
+ )
179
+
180
+
181
+ if __name__ == "__main__":
182
+ demo.launch(
183
+ mcp_server=True,
184
+ server_name="0.0.0.0",
185
+ server_port=7860,
186
+ )
requirements.txt ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MCP Server for Indicateurs Territoriaux de Transition Écologique
2
+ # Python >= 3.10 required
3
+
4
+ # Core dependencies
5
+ gradio[mcp]>=5.0.0
6
+ httpx>=0.27.0
7
+ pydantic>=2.0.0
8
+ python-dotenv>=1.0.0
9
+
10
+ # For async support
11
+ anyio>=4.0.0
src/__init__.py ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ """MCP Server for French Territorial Ecological Indicators."""
2
+
3
+ __version__ = "0.1.0"
src/api_client.py ADDED
@@ -0,0 +1,317 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """HTTP client for the Cube.js API of Indicateurs Territoriaux."""
2
+
3
+ import os
4
+ from typing import Any
5
+
6
+ import httpx
7
+ from dotenv import load_dotenv
8
+
9
+ load_dotenv()
10
+
11
+
12
+ class CubeJsClientError(Exception):
13
+ """Base exception for Cube.js client errors."""
14
+
15
+ pass
16
+
17
+
18
+ class AuthenticationError(CubeJsClientError):
19
+ """Raised when authentication fails (401)."""
20
+
21
+ pass
22
+
23
+
24
+ class BadRequestError(CubeJsClientError):
25
+ """Raised when the request is malformed (400)."""
26
+
27
+ pass
28
+
29
+
30
+ class CubeJsClient:
31
+ """HTTP client for the Cube.js REST API.
32
+
33
+ This client handles authentication and provides methods to interact
34
+ with the Indicateurs Territoriaux API endpoints.
35
+ """
36
+
37
+ def __init__(
38
+ self,
39
+ base_url: str | None = None,
40
+ token: str | None = None,
41
+ timeout: float = 30.0,
42
+ ):
43
+ """Initialize the Cube.js client.
44
+
45
+ Args:
46
+ base_url: Base URL of the API. Defaults to env var INDICATEURS_TE_BASE_URL.
47
+ token: JWT authentication token. Defaults to env var INDICATEURS_TE_TOKEN.
48
+ timeout: Request timeout in seconds.
49
+ """
50
+ self.base_url = (
51
+ base_url
52
+ or os.getenv("INDICATEURS_TE_BASE_URL")
53
+ or "https://api.indicateurs.ecologie.gouv.fr"
54
+ )
55
+ self.token = token or os.getenv("INDICATEURS_TE_TOKEN")
56
+
57
+ if not self.token:
58
+ raise ValueError(
59
+ "No API token provided. Set INDICATEURS_TE_TOKEN environment variable "
60
+ "or pass token parameter."
61
+ )
62
+
63
+ self.timeout = timeout
64
+ self._client: httpx.AsyncClient | None = None
65
+
66
+ @property
67
+ def headers(self) -> dict[str, str]:
68
+ """HTTP headers for API requests."""
69
+ return {
70
+ "Authorization": f"Bearer {self.token}",
71
+ "Content-Type": "application/json",
72
+ }
73
+
74
+ async def _get_client(self) -> httpx.AsyncClient:
75
+ """Get or create the async HTTP client."""
76
+ if self._client is None or self._client.is_closed:
77
+ self._client = httpx.AsyncClient(
78
+ base_url=self.base_url,
79
+ headers=self.headers,
80
+ timeout=self.timeout,
81
+ )
82
+ return self._client
83
+
84
+ async def close(self) -> None:
85
+ """Close the HTTP client."""
86
+ if self._client is not None and not self._client.is_closed:
87
+ await self._client.aclose()
88
+ self._client = None
89
+
90
+ async def _handle_response(self, response: httpx.Response) -> dict[str, Any]:
91
+ """Handle API response and raise appropriate errors.
92
+
93
+ Args:
94
+ response: The HTTP response object.
95
+
96
+ Returns:
97
+ Parsed JSON response.
98
+
99
+ Raises:
100
+ AuthenticationError: If the token is invalid or expired (401).
101
+ BadRequestError: If the request is malformed (400).
102
+ CubeJsClientError: For other HTTP errors.
103
+ """
104
+ if response.status_code == 401:
105
+ raise AuthenticationError(
106
+ "Authentication failed. Your API token may be invalid or expired. "
107
+ "Please check your INDICATEURS_TE_TOKEN environment variable."
108
+ )
109
+
110
+ if response.status_code == 400:
111
+ try:
112
+ error_detail = response.json()
113
+ except Exception:
114
+ error_detail = response.text
115
+ raise BadRequestError(
116
+ f"Bad request to API. Details: {error_detail}"
117
+ )
118
+
119
+ if response.status_code >= 400:
120
+ raise CubeJsClientError(
121
+ f"API request failed with status {response.status_code}: {response.text}"
122
+ )
123
+
124
+ return response.json()
125
+
126
+ async def get_meta(self) -> dict[str, Any]:
127
+ """Fetch the API schema metadata.
128
+
129
+ Returns the complete schema including all cubes, their measures,
130
+ dimensions, and available filters.
131
+
132
+ Returns:
133
+ Dict containing the API metadata with 'cubes' key.
134
+
135
+ Raises:
136
+ AuthenticationError: If authentication fails.
137
+ CubeJsClientError: For other API errors.
138
+ """
139
+ client = await self._get_client()
140
+ response = await client.get("/cubejs-api/v1/meta")
141
+ return await self._handle_response(response)
142
+
143
+ async def load(self, query: dict[str, Any]) -> dict[str, Any]:
144
+ """Execute a data query against the Cube.js API.
145
+
146
+ Args:
147
+ query: The Cube.js query object containing measures, dimensions,
148
+ filters, and other query parameters.
149
+
150
+ Returns:
151
+ Dict containing the query results with 'data' key.
152
+
153
+ Raises:
154
+ AuthenticationError: If authentication fails.
155
+ BadRequestError: If the query is malformed.
156
+ CubeJsClientError: For other API errors.
157
+
158
+ Example:
159
+ >>> query = {
160
+ ... "measures": ["indicateur_metadata.count"],
161
+ ... "dimensions": ["indicateur_metadata.id", "indicateur_metadata.libelle"],
162
+ ... "limit": 10
163
+ ... }
164
+ >>> result = await client.load(query)
165
+ """
166
+ client = await self._get_client()
167
+ response = await client.post(
168
+ "/cubejs-api/v1/load",
169
+ json={"query": query},
170
+ )
171
+ return await self._handle_response(response)
172
+
173
+ async def load_indicators_metadata(
174
+ self,
175
+ dimensions: list[str] | None = None,
176
+ filters: list[dict[str, Any]] | None = None,
177
+ limit: int = 500,
178
+ ) -> list[dict[str, Any]]:
179
+ """Load indicator metadata from the indicateur_metadata cube.
180
+
181
+ Convenience method for querying the indicator metadata cube.
182
+
183
+ Args:
184
+ dimensions: List of dimensions to fetch. Defaults to basic info.
185
+ filters: Optional list of filters to apply.
186
+ limit: Maximum number of results.
187
+
188
+ Returns:
189
+ List of indicator metadata records.
190
+ """
191
+ if dimensions is None:
192
+ dimensions = [
193
+ "indicateur_metadata.id",
194
+ "indicateur_metadata.libelle",
195
+ "indicateur_metadata.unite",
196
+ "indicateur_metadata.description",
197
+ "indicateur_metadata.mailles_disponibles",
198
+ "indicateur_metadata.thematique_fnv",
199
+ "indicateur_metadata.annees_disponibles",
200
+ ]
201
+
202
+ query: dict[str, Any] = {
203
+ "dimensions": dimensions,
204
+ "limit": limit,
205
+ }
206
+
207
+ if filters:
208
+ query["filters"] = filters
209
+
210
+ result = await self.load(query)
211
+ return result.get("data", [])
212
+
213
+ async def load_sources_metadata(
214
+ self,
215
+ indicator_id: int | None = None,
216
+ limit: int = 100,
217
+ ) -> list[dict[str, Any]]:
218
+ """Load source metadata from the indicateur_x_source_metadata cube.
219
+
220
+ Args:
221
+ indicator_id: Optional indicator ID to filter sources.
222
+ limit: Maximum number of results.
223
+
224
+ Returns:
225
+ List of source metadata records.
226
+ """
227
+ dimensions = [
228
+ "indicateur_x_source_metadata.id_indicateur",
229
+ "indicateur_x_source_metadata.nom_source",
230
+ "indicateur_x_source_metadata.libelle",
231
+ "indicateur_x_source_metadata.description",
232
+ "indicateur_x_source_metadata.producteur_source",
233
+ "indicateur_x_source_metadata.distributeur_source",
234
+ "indicateur_x_source_metadata.license_source",
235
+ "indicateur_x_source_metadata.lien_page",
236
+ "indicateur_x_source_metadata.date_derniere_extraction",
237
+ ]
238
+
239
+ query: dict[str, Any] = {
240
+ "dimensions": dimensions,
241
+ "limit": limit,
242
+ }
243
+
244
+ if indicator_id is not None:
245
+ query["filters"] = [
246
+ {
247
+ "member": "indicateur_x_source_metadata.id_indicateur",
248
+ "operator": "equals",
249
+ "values": [str(indicator_id)],
250
+ }
251
+ ]
252
+
253
+ result = await self.load(query)
254
+ return result.get("data", [])
255
+
256
+ async def search_indicators_by_libelle(
257
+ self,
258
+ search_term: str,
259
+ limit: int = 50,
260
+ ) -> list[dict[str, Any]]:
261
+ """Search indicators by keyword in libelle using contains filter.
262
+
263
+ This uses Cube.js contains operator for server-side filtering.
264
+ Note: Limited to single term, for multi-term use client-side filtering.
265
+
266
+ Args:
267
+ search_term: Term to search for in indicator libelle.
268
+ limit: Maximum number of results.
269
+
270
+ Returns:
271
+ List of matching indicator metadata records.
272
+ """
273
+ query: dict[str, Any] = {
274
+ "dimensions": [
275
+ "indicateur_metadata.id",
276
+ "indicateur_metadata.libelle",
277
+ "indicateur_metadata.description",
278
+ "indicateur_metadata.unite",
279
+ "indicateur_metadata.mailles_disponibles",
280
+ "indicateur_metadata.thematique_fnv",
281
+ ],
282
+ "filters": [
283
+ {
284
+ "member": "indicateur_metadata.libelle",
285
+ "operator": "contains",
286
+ "values": [search_term],
287
+ }
288
+ ],
289
+ "limit": limit,
290
+ }
291
+
292
+ result = await self.load(query)
293
+ return result.get("data", [])
294
+
295
+
296
+ # Singleton instance for the application
297
+ _client_instance: CubeJsClient | None = None
298
+
299
+
300
+ def get_client() -> CubeJsClient:
301
+ """Get or create the singleton CubeJsClient instance.
302
+
303
+ Returns:
304
+ The shared CubeJsClient instance.
305
+ """
306
+ global _client_instance
307
+ if _client_instance is None:
308
+ _client_instance = CubeJsClient()
309
+ return _client_instance
310
+
311
+
312
+ async def close_client() -> None:
313
+ """Close the singleton client instance."""
314
+ global _client_instance
315
+ if _client_instance is not None:
316
+ await _client_instance.close()
317
+ _client_instance = None
src/cache.py ADDED
@@ -0,0 +1,299 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Metadata cache for indicators and cube mappings."""
2
+
3
+ import asyncio
4
+ import os
5
+ from datetime import datetime, timedelta
6
+ from typing import Any
7
+
8
+ from .api_client import CubeJsClient, get_client
9
+ from .cube_resolver import CubeResolver, get_resolver
10
+ from .models import IndicatorMetadata, IndicatorListItem
11
+
12
+
13
+ class IndicatorCache:
14
+ """Cache for indicator metadata and cube resolution.
15
+
16
+ This cache stores indicator metadata loaded at startup and periodically
17
+ refreshes to pick up new indicators. It also initializes the CubeResolver
18
+ for mapping indicator IDs to data cubes.
19
+ """
20
+
21
+ def __init__(
22
+ self,
23
+ refresh_interval_seconds: int | None = None,
24
+ ):
25
+ """Initialize the cache.
26
+
27
+ Args:
28
+ refresh_interval_seconds: How often to refresh the cache.
29
+ Defaults to CACHE_REFRESH_SECONDS env var or 3600 (1 hour).
30
+ """
31
+ self.refresh_interval = timedelta(
32
+ seconds=refresh_interval_seconds
33
+ or int(os.getenv("CACHE_REFRESH_SECONDS", "3600"))
34
+ )
35
+
36
+ # Indicator metadata by ID
37
+ self._indicators: dict[int, IndicatorMetadata] = {}
38
+
39
+ # Reference to the cube resolver
40
+ self._resolver: CubeResolver = get_resolver()
41
+
42
+ # Last refresh timestamp
43
+ self._last_refresh: datetime | None = None
44
+
45
+ # Lock for thread-safe refresh
46
+ self._refresh_lock = asyncio.Lock()
47
+
48
+ # Flag to indicate if initial load is complete
49
+ self._initialized = False
50
+
51
+ @property
52
+ def is_initialized(self) -> bool:
53
+ """Check if the cache has been initialized."""
54
+ return self._initialized
55
+
56
+ @property
57
+ def needs_refresh(self) -> bool:
58
+ """Check if the cache needs to be refreshed."""
59
+ if not self._initialized or self._last_refresh is None:
60
+ return True
61
+ return datetime.now() - self._last_refresh > self.refresh_interval
62
+
63
+ @property
64
+ def indicators(self) -> dict[int, IndicatorMetadata]:
65
+ """Get all cached indicators."""
66
+ return self._indicators.copy()
67
+
68
+ @property
69
+ def resolver(self) -> CubeResolver:
70
+ """Get the cube resolver instance."""
71
+ return self._resolver
72
+
73
+ async def initialize(self, client: CubeJsClient | None = None) -> None:
74
+ """Initialize the cache with data from the API.
75
+
76
+ This should be called at application startup.
77
+
78
+ Args:
79
+ client: Optional CubeJsClient instance. If not provided,
80
+ uses the singleton instance.
81
+ """
82
+ if client is None:
83
+ client = get_client()
84
+
85
+ await self.refresh(client)
86
+
87
+ async def refresh(self, client: CubeJsClient | None = None) -> None:
88
+ """Refresh the cache from the API.
89
+
90
+ Args:
91
+ client: Optional CubeJsClient instance.
92
+ """
93
+ async with self._refresh_lock:
94
+ if client is None:
95
+ client = get_client()
96
+
97
+ # Load indicator metadata
98
+ await self._load_indicators(client)
99
+
100
+ # Load and parse /meta for cube resolution
101
+ await self._load_cube_metadata(client)
102
+
103
+ self._last_refresh = datetime.now()
104
+ self._initialized = True
105
+
106
+ async def _load_indicators(self, client: CubeJsClient) -> None:
107
+ """Load all indicator metadata from the API."""
108
+ # Note: Some dimensions listed in /meta may not exist in actual data
109
+ # Only include dimensions that have been validated to work
110
+ dimensions = [
111
+ "indicateur_metadata.id",
112
+ "indicateur_metadata.libelle",
113
+ "indicateur_metadata.unite",
114
+ "indicateur_metadata.description",
115
+ "indicateur_metadata.methode_calcul",
116
+ "indicateur_metadata.annees_disponibles",
117
+ "indicateur_metadata.mailles_disponibles",
118
+ "indicateur_metadata.maille_mini_disponible",
119
+ "indicateur_metadata.couverture_geographique",
120
+ "indicateur_metadata.completion_region",
121
+ "indicateur_metadata.completion_departement",
122
+ "indicateur_metadata.completion_epci",
123
+ "indicateur_metadata.completion_commune",
124
+ "indicateur_metadata.thematique_fnv",
125
+ # Note: secteur_fnv, enjeux_fnv, levier_fnv cause errors despite being in schema
126
+ ]
127
+
128
+ data = await client.load_indicators_metadata(
129
+ dimensions=dimensions,
130
+ limit=1000, # Should be enough for all indicators
131
+ )
132
+
133
+ self._indicators.clear()
134
+ for row in data:
135
+ try:
136
+ indicator = IndicatorMetadata.from_api_response(row)
137
+ self._indicators[indicator.id] = indicator
138
+ except Exception as e:
139
+ # Log but don't fail on individual indicator parsing errors
140
+ print(f"Warning: Failed to parse indicator: {e}")
141
+
142
+ async def _load_cube_metadata(self, client: CubeJsClient) -> None:
143
+ """Load cube metadata from /meta and initialize the resolver."""
144
+ meta = await client.get_meta()
145
+ self._resolver.load_from_meta(meta)
146
+
147
+ def get_indicator(self, indicator_id: int) -> IndicatorMetadata | None:
148
+ """Get indicator metadata by ID.
149
+
150
+ Args:
151
+ indicator_id: The indicator ID.
152
+
153
+ Returns:
154
+ The indicator metadata, or None if not found.
155
+ """
156
+ return self._indicators.get(indicator_id)
157
+
158
+ def get_cube_name(self, indicator_id: int, maille: str) -> str | None:
159
+ """Get the data cube name for an indicator at a specific maille.
160
+
161
+ Args:
162
+ indicator_id: The indicator ID.
163
+ maille: The geographic level.
164
+
165
+ Returns:
166
+ The cube name, or None if not found.
167
+ """
168
+ return self._resolver.find_cube_for_indicator(indicator_id, maille)
169
+
170
+ def list_indicators(
171
+ self,
172
+ thematique: str | None = None,
173
+ maille: str | None = None,
174
+ ) -> list[IndicatorListItem]:
175
+ """List indicators with optional filtering.
176
+
177
+ Args:
178
+ thematique: Filter by thematique_fnv (case-insensitive partial match).
179
+ maille: Filter by available geographic level.
180
+
181
+ Returns:
182
+ List of matching indicators.
183
+ """
184
+ results = []
185
+
186
+ for indicator in self._indicators.values():
187
+ # Apply thematique filter
188
+ if thematique:
189
+ if not indicator.thematique_fnv:
190
+ continue
191
+ if thematique.lower() not in indicator.thematique_fnv.lower():
192
+ continue
193
+
194
+ # Apply maille filter
195
+ if maille:
196
+ if not indicator.has_geographic_level(maille):
197
+ continue
198
+
199
+ results.append(
200
+ IndicatorListItem(
201
+ id=indicator.id,
202
+ libelle=indicator.libelle,
203
+ unite=indicator.unite,
204
+ mailles_disponibles=indicator.mailles_disponibles,
205
+ thematique_fnv=indicator.thematique_fnv,
206
+ )
207
+ )
208
+
209
+ # Sort by ID for consistent ordering
210
+ results.sort(key=lambda x: x.id)
211
+ return results
212
+
213
+ def search_indicators(self, query: str) -> list[IndicatorListItem]:
214
+ """Search indicators by keyword.
215
+
216
+ Searches in libelle and description fields (case-insensitive).
217
+
218
+ Args:
219
+ query: Search query string.
220
+
221
+ Returns:
222
+ List of matching indicators.
223
+ """
224
+ if not query or not query.strip():
225
+ return self.list_indicators()
226
+
227
+ query_lower = query.lower().strip()
228
+ query_words = query_lower.split()
229
+ results = []
230
+
231
+ for indicator in self._indicators.values():
232
+ # Search in libelle and description
233
+ searchable = " ".join(
234
+ filter(None, [indicator.libelle, indicator.description])
235
+ ).lower()
236
+
237
+ # Check if all query words are present
238
+ if all(word in searchable for word in query_words):
239
+ results.append(
240
+ IndicatorListItem(
241
+ id=indicator.id,
242
+ libelle=indicator.libelle,
243
+ unite=indicator.unite,
244
+ mailles_disponibles=indicator.mailles_disponibles,
245
+ thematique_fnv=indicator.thematique_fnv,
246
+ )
247
+ )
248
+
249
+ # Sort by relevance (exact match in libelle first, then by ID)
250
+ def sort_key(item: IndicatorListItem) -> tuple[int, int]:
251
+ exact_match = 0 if query_lower in item.libelle.lower() else 1
252
+ return (exact_match, item.id)
253
+
254
+ results.sort(key=sort_key)
255
+ return results
256
+
257
+
258
+ # Singleton cache instance
259
+ _cache_instance: IndicatorCache | None = None
260
+
261
+
262
+ def get_cache() -> IndicatorCache:
263
+ """Get or create the singleton IndicatorCache instance.
264
+
265
+ Returns:
266
+ The shared IndicatorCache instance.
267
+ """
268
+ global _cache_instance
269
+ if _cache_instance is None:
270
+ _cache_instance = IndicatorCache()
271
+ return _cache_instance
272
+
273
+
274
+ async def initialize_cache(client: CubeJsClient | None = None) -> IndicatorCache:
275
+ """Initialize the singleton cache.
276
+
277
+ This should be called at application startup.
278
+
279
+ Args:
280
+ client: Optional CubeJsClient instance.
281
+
282
+ Returns:
283
+ The initialized cache.
284
+ """
285
+ cache = get_cache()
286
+ if not cache.is_initialized:
287
+ await cache.initialize(client)
288
+ return cache
289
+
290
+
291
+ async def refresh_cache_if_needed(client: CubeJsClient | None = None) -> None:
292
+ """Refresh the cache if it's stale.
293
+
294
+ Args:
295
+ client: Optional CubeJsClient instance.
296
+ """
297
+ cache = get_cache()
298
+ if cache.needs_refresh:
299
+ await cache.refresh(client)
src/cube_resolver.py ADDED
@@ -0,0 +1,286 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Cube resolution logic for mapping indicator IDs to data cubes.
2
+
3
+ The API uses a specific naming convention:
4
+ - Data cubes: {thematique}_{maille} (e.g., conso_enaf_com, surface_bio_dpt)
5
+ - Measures: {cube_name}.id_{indicator_id} (e.g., conso_enaf_com.id_611)
6
+ - Geographic dimensions: geocode_{maille}, libelle_{maille}
7
+
8
+ This module provides logic to find the correct cube for a given indicator
9
+ and geographic level by parsing the /meta endpoint.
10
+ """
11
+
12
+ from typing import Any
13
+
14
+ from .models import MAILLE_SUFFIX_MAP, GEO_DIMENSION_PATTERNS, CubeInfo
15
+
16
+
17
+ class CubeResolver:
18
+ """Resolves indicator IDs to their corresponding data cubes.
19
+
20
+ The resolver caches the /meta response and provides efficient lookup
21
+ of cubes by indicator ID and geographic level.
22
+ """
23
+
24
+ def __init__(self):
25
+ """Initialize the resolver."""
26
+ # Cache of cube metadata from /meta
27
+ self._cubes_meta: list[dict[str, Any]] = []
28
+
29
+ # Mapping: indicator_id -> {maille -> cube_name}
30
+ self._indicator_cube_map: dict[int, dict[str, str]] = {}
31
+
32
+ # Mapping: cube_name -> CubeInfo
33
+ self._cube_info: dict[str, CubeInfo] = {}
34
+
35
+ # Set of all indicator IDs found in cubes
36
+ self._known_indicator_ids: set[int] = set()
37
+
38
+ self._initialized = False
39
+
40
+ @property
41
+ def is_initialized(self) -> bool:
42
+ """Check if the resolver has been initialized."""
43
+ return self._initialized
44
+
45
+ def load_from_meta(self, meta_response: dict[str, Any]) -> None:
46
+ """Load and parse cube metadata from /meta response.
47
+
48
+ Args:
49
+ meta_response: The response from /cubejs-api/v1/meta
50
+ """
51
+ self._cubes_meta = meta_response.get("cubes", [])
52
+ self._build_mappings()
53
+ self._initialized = True
54
+
55
+ def _build_mappings(self) -> None:
56
+ """Build the internal mappings from cube metadata."""
57
+ self._indicator_cube_map.clear()
58
+ self._cube_info.clear()
59
+ self._known_indicator_ids.clear()
60
+
61
+ for cube in self._cubes_meta:
62
+ cube_name = cube.get("name", "")
63
+
64
+ # Skip metadata cubes
65
+ if cube_name in ("indicateur_metadata", "indicateur_x_source_metadata"):
66
+ continue
67
+
68
+ # Determine ALL available mailles from cube dimensions
69
+ available_mailles = self._detect_all_mailles(cube)
70
+ if not available_mailles:
71
+ continue
72
+
73
+ # Extract indicator IDs from measures
74
+ indicator_ids = self._extract_indicator_ids(cube)
75
+
76
+ if indicator_ids:
77
+ # Store cube info (use finest maille as primary)
78
+ finest_maille = available_mailles[0] # Already sorted finest-first
79
+ self._cube_info[cube_name] = CubeInfo(
80
+ name=cube_name,
81
+ maille=finest_maille,
82
+ indicator_ids=indicator_ids,
83
+ )
84
+
85
+ # Build reverse mapping: indicator_id -> {maille -> cube_name}
86
+ # Register cube for ALL available mailles
87
+ for ind_id in indicator_ids:
88
+ self._known_indicator_ids.add(ind_id)
89
+ if ind_id not in self._indicator_cube_map:
90
+ self._indicator_cube_map[ind_id] = {}
91
+ for maille in available_mailles:
92
+ # Only register if not already mapped (prefer finest cube)
93
+ if maille not in self._indicator_cube_map[ind_id]:
94
+ self._indicator_cube_map[ind_id][maille] = cube_name
95
+
96
+ def _detect_all_mailles(self, cube: dict[str, Any]) -> list[str]:
97
+ """Detect ALL available geographic levels (mailles) in a cube.
98
+
99
+ Cubes like conso_enaf_com contain dimensions for all levels
100
+ (commune, epci, departement, region) allowing queries at any level.
101
+
102
+ Args:
103
+ cube: Cube metadata from /meta
104
+
105
+ Returns:
106
+ List of available mailles, sorted from finest to coarsest
107
+ (commune, epci, departement, region)
108
+ """
109
+ dimensions = cube.get("dimensions", [])
110
+ dim_names = [d.get("name", "") for d in dimensions]
111
+
112
+ # Order of mailles from finest to coarsest
113
+ maille_order = ["commune", "epci", "departement", "region"]
114
+ available = []
115
+
116
+ for maille in maille_order:
117
+ patterns = GEO_DIMENSION_PATTERNS.get(maille, {})
118
+ geocode_dim = patterns.get("geocode", "")
119
+ # Dimension names are prefixed with cube name
120
+ if any(geocode_dim in dim_name for dim_name in dim_names):
121
+ available.append(maille)
122
+
123
+ return available
124
+
125
+ def _detect_maille(self, cube: dict[str, Any]) -> str | None:
126
+ """Detect the finest geographic level (maille) of a cube.
127
+
128
+ Args:
129
+ cube: Cube metadata from /meta
130
+
131
+ Returns:
132
+ The finest maille name or None
133
+ """
134
+ mailles = self._detect_all_mailles(cube)
135
+ return mailles[0] if mailles else None
136
+
137
+ def _extract_indicator_ids(self, cube: dict[str, Any]) -> list[int]:
138
+ """Extract indicator IDs from cube measures.
139
+
140
+ Measures follow the pattern: {cube_name}.id_{indicator_id}
141
+
142
+ Args:
143
+ cube: Cube metadata from /meta
144
+
145
+ Returns:
146
+ List of indicator IDs found in the cube's measures
147
+ """
148
+ measures = cube.get("measures", [])
149
+ indicator_ids = []
150
+
151
+ for measure in measures:
152
+ measure_name = measure.get("name", "")
153
+ # Look for .id_{number} pattern
154
+ if ".id_" in measure_name:
155
+ try:
156
+ # Extract the ID after "id_"
157
+ id_part = measure_name.split(".id_")[-1]
158
+ # Handle potential additional suffixes
159
+ id_str = id_part.split("_")[0].split(".")[0]
160
+ indicator_id = int(id_str)
161
+ indicator_ids.append(indicator_id)
162
+ except (ValueError, IndexError):
163
+ continue
164
+
165
+ return indicator_ids
166
+
167
+ def find_cube_for_indicator(
168
+ self,
169
+ indicator_id: int,
170
+ maille: str,
171
+ ) -> str | None:
172
+ """Find the data cube for a given indicator and geographic level.
173
+
174
+ Args:
175
+ indicator_id: The indicator ID to look up
176
+ maille: The geographic level ('commune', 'epci', 'departement', 'region')
177
+
178
+ Returns:
179
+ The cube name if found, None otherwise
180
+ """
181
+ if not self._initialized:
182
+ return None
183
+
184
+ maille_lower = maille.lower()
185
+
186
+ # Check direct mapping
187
+ if indicator_id in self._indicator_cube_map:
188
+ cube_map = self._indicator_cube_map[indicator_id]
189
+ if maille_lower in cube_map:
190
+ return cube_map[maille_lower]
191
+
192
+ return None
193
+
194
+ def get_measure_name(self, cube_name: str, indicator_id: int) -> str:
195
+ """Get the full measure name for an indicator in a cube.
196
+
197
+ Args:
198
+ cube_name: The cube name
199
+ indicator_id: The indicator ID
200
+
201
+ Returns:
202
+ The full measure name (e.g., 'conso_enaf_com.id_611')
203
+ """
204
+ return f"{cube_name}.id_{indicator_id}"
205
+
206
+ def get_dimension_name(self, cube_name: str, dimension: str) -> str:
207
+ """Get the full dimension name for a cube.
208
+
209
+ Args:
210
+ cube_name: The cube name
211
+ dimension: The dimension name (e.g., 'geocode_region')
212
+
213
+ Returns:
214
+ The full dimension name (e.g., 'conso_enaf_com.geocode_region')
215
+ """
216
+ return f"{cube_name}.{dimension}"
217
+
218
+ def get_available_mailles(self, indicator_id: int) -> list[str]:
219
+ """Get the available geographic levels for an indicator.
220
+
221
+ Args:
222
+ indicator_id: The indicator ID
223
+
224
+ Returns:
225
+ List of available mailles
226
+ """
227
+ if indicator_id not in self._indicator_cube_map:
228
+ return []
229
+ return list(self._indicator_cube_map[indicator_id].keys())
230
+
231
+ def get_cube_info(self, cube_name: str) -> CubeInfo | None:
232
+ """Get information about a cube.
233
+
234
+ Args:
235
+ cube_name: The cube name
236
+
237
+ Returns:
238
+ CubeInfo if found, None otherwise
239
+ """
240
+ return self._cube_info.get(cube_name)
241
+
242
+ def is_indicator_known(self, indicator_id: int) -> bool:
243
+ """Check if an indicator ID exists in any cube.
244
+
245
+ Args:
246
+ indicator_id: The indicator ID to check
247
+
248
+ Returns:
249
+ True if the indicator exists in at least one cube
250
+ """
251
+ return indicator_id in self._known_indicator_ids
252
+
253
+ def list_all_cubes(self) -> list[CubeInfo]:
254
+ """List all data cubes with their metadata.
255
+
256
+ Returns:
257
+ List of CubeInfo objects
258
+ """
259
+ return list(self._cube_info.values())
260
+
261
+ def get_cubes_for_indicator(self, indicator_id: int) -> dict[str, str]:
262
+ """Get all cubes containing a given indicator.
263
+
264
+ Args:
265
+ indicator_id: The indicator ID
266
+
267
+ Returns:
268
+ Dict mapping maille to cube_name
269
+ """
270
+ return self._indicator_cube_map.get(indicator_id, {}).copy()
271
+
272
+
273
+ # Singleton instance
274
+ _resolver_instance: CubeResolver | None = None
275
+
276
+
277
+ def get_resolver() -> CubeResolver:
278
+ """Get or create the singleton CubeResolver instance.
279
+
280
+ Returns:
281
+ The shared CubeResolver instance
282
+ """
283
+ global _resolver_instance
284
+ if _resolver_instance is None:
285
+ _resolver_instance = CubeResolver()
286
+ return _resolver_instance
src/models.py ADDED
@@ -0,0 +1,239 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Pydantic models for the Indicateurs Territoriaux API responses."""
2
+
3
+ from typing import Any
4
+
5
+ from pydantic import BaseModel, Field
6
+
7
+
8
+ class IndicatorMetadata(BaseModel):
9
+ """Metadata for a territorial indicator."""
10
+
11
+ id: int = Field(..., description="Unique identifier of the indicator")
12
+ libelle: str = Field(..., description="Human-readable name of the indicator")
13
+ unite: str | None = Field(None, description="Unit of measurement")
14
+ description: str | None = Field(None, description="Detailed description")
15
+ methode_calcul: str | None = Field(None, description="Calculation method")
16
+ fonction_calcul: str | None = Field(None, description="Calculation function")
17
+ date_debut: int | None = Field(None, description="First available year")
18
+ date_fin: int | None = Field(None, description="Last available year")
19
+ annees_disponibles: str | None = Field(
20
+ None, description="Available years (comma-separated)"
21
+ )
22
+ annees_manquantes: str | None = Field(
23
+ None, description="Missing years (comma-separated)"
24
+ )
25
+ mailles_disponibles: str | None = Field(
26
+ None, description="Available geographic levels (e.g., 'region,departement,epci')"
27
+ )
28
+ maille_mini_disponible: str | None = Field(
29
+ None, description="Finest available geographic level"
30
+ )
31
+ couverture_geographique: str | None = Field(
32
+ None, description="Geographic coverage (France métro, DOM, etc.)"
33
+ )
34
+ liste_drom: str | None = Field(None, description="Covered DROM territories")
35
+ completion_region: float | None = Field(
36
+ None, description="Completion percentage at region level"
37
+ )
38
+ completion_departement: float | None = Field(
39
+ None, description="Completion percentage at department level"
40
+ )
41
+ completion_epci: float | None = Field(
42
+ None, description="Completion percentage at EPCI level"
43
+ )
44
+ completion_commune: float | None = Field(
45
+ None, description="Completion percentage at commune level"
46
+ )
47
+ compte_region: int | None = Field(
48
+ None, description="Number of regions with data"
49
+ )
50
+ compte_departement: int | None = Field(
51
+ None, description="Number of departments with data"
52
+ )
53
+ compte_epci: int | None = Field(None, description="Number of EPCIs with data")
54
+ compte_commune: int | None = Field(
55
+ None, description="Number of communes with data"
56
+ )
57
+ thematique_fnv: str | None = Field(
58
+ None, description="France Nation Verte thematic"
59
+ )
60
+ secteur_fnv: str | None = Field(None, description="FNV sector")
61
+ enjeux_fnv: str | None = Field(None, description="FNV challenges")
62
+ levier_fnv: str | None = Field(None, description="FNV lever")
63
+ projets_associes: str | None = Field(None, description="Associated projects")
64
+ valeur_axes: str | None = Field(
65
+ None, description="Breakdown axes (JSON stringified)"
66
+ )
67
+
68
+ @classmethod
69
+ def from_api_response(cls, data: dict[str, Any]) -> "IndicatorMetadata":
70
+ """Create an IndicatorMetadata from a Cube.js API response row.
71
+
72
+ The API returns dimension names prefixed with the cube name.
73
+ This method strips the prefix.
74
+ """
75
+ # Strip the cube name prefix from keys
76
+ prefix = "indicateur_metadata."
77
+ cleaned = {}
78
+ for key, value in data.items():
79
+ clean_key = key.replace(prefix, "")
80
+ cleaned[clean_key] = value
81
+ return cls(**cleaned)
82
+
83
+ def has_geographic_level(self, level: str) -> bool:
84
+ """Check if the indicator has data at the specified geographic level."""
85
+ if not self.mailles_disponibles:
86
+ return False
87
+ return level.lower() in self.mailles_disponibles.lower()
88
+
89
+ def get_completion_for_level(self, level: str) -> float | None:
90
+ """Get the completion percentage for a geographic level."""
91
+ level_map = {
92
+ "region": self.completion_region,
93
+ "departement": self.completion_departement,
94
+ "epci": self.completion_epci,
95
+ "commune": self.completion_commune,
96
+ }
97
+ return level_map.get(level.lower())
98
+
99
+
100
+ class SourceMetadata(BaseModel):
101
+ """Metadata for a data source associated with an indicator."""
102
+
103
+ id_indicateur: int = Field(..., description="ID of the related indicator")
104
+ nom_source: str | None = Field(None, description="Source identifier")
105
+ libelle: str | None = Field(None, description="Human-readable source name")
106
+ description: str | None = Field(None, description="Source description")
107
+ producteur_source: str | None = Field(None, description="Data producer")
108
+ distributeur_source: str | None = Field(None, description="Data distributor")
109
+ license_source: str | None = Field(None, description="Data license")
110
+ lien_page: str | None = Field(None, description="Source URL")
111
+ annees_disponibles_source: str | None = Field(
112
+ None, description="Available years from this source"
113
+ )
114
+ annees_manquantes_source: str | None = Field(
115
+ None, description="Missing years from this source"
116
+ )
117
+ maille_mini_disponible: str | None = Field(
118
+ None, description="Finest geographic level"
119
+ )
120
+ couverture_geographique: str | None = Field(
121
+ None, description="Geographic coverage"
122
+ )
123
+ date_derniere_extraction: str | None = Field(
124
+ None, description="Last extraction date"
125
+ )
126
+
127
+ @classmethod
128
+ def from_api_response(cls, data: dict[str, Any]) -> "SourceMetadata":
129
+ """Create a SourceMetadata from a Cube.js API response row."""
130
+ prefix = "indicateur_x_source_metadata."
131
+ cleaned = {}
132
+ for key, value in data.items():
133
+ clean_key = key.replace(prefix, "")
134
+ cleaned[clean_key] = value
135
+ return cls(**cleaned)
136
+
137
+
138
+ class IndicatorListItem(BaseModel):
139
+ """Simplified indicator info for list responses."""
140
+
141
+ id: int
142
+ libelle: str
143
+ unite: str | None = None
144
+ mailles_disponibles: str | None = None
145
+ thematique_fnv: str | None = None
146
+
147
+
148
+ class IndicatorDetails(BaseModel):
149
+ """Complete indicator details with sources."""
150
+
151
+ metadata: IndicatorMetadata
152
+ sources: list[SourceMetadata] = Field(default_factory=list)
153
+
154
+
155
+ class GeographicDataPoint(BaseModel):
156
+ """A single data point with geographic information."""
157
+
158
+ geocode: str = Field(..., description="INSEE code of the territory")
159
+ libelle: str | None = Field(None, description="Name of the territory")
160
+ valeur: float | str | None = Field(None, description="Indicator value")
161
+ annee: str | None = Field(None, description="Year of the data")
162
+ unite: str | None = Field(None, description="Unit of measurement")
163
+
164
+
165
+ class QueryResult(BaseModel):
166
+ """Result of a data query."""
167
+
168
+ indicator_id: int
169
+ indicator_name: str
170
+ geographic_level: str
171
+ data: list[GeographicDataPoint]
172
+ total_count: int = 0
173
+ query_info: dict[str, Any] = Field(default_factory=dict)
174
+
175
+
176
+ class SearchResult(BaseModel):
177
+ """Result of an indicator search."""
178
+
179
+ indicators: list[IndicatorListItem]
180
+ query: str
181
+ total_count: int
182
+
183
+
184
+ class CubeInfo(BaseModel):
185
+ """Information about a data cube."""
186
+
187
+ name: str = Field(..., description="Cube name (e.g., 'conso_enaf_com')")
188
+ maille: str = Field(..., description="Geographic level (commune, epci, departement, region)")
189
+ indicator_ids: list[int] = Field(default_factory=list, description="Indicator IDs in this cube")
190
+
191
+
192
+ # Geographic level constants
193
+ GEOGRAPHIC_LEVELS = ["region", "departement", "epci", "commune"]
194
+
195
+ # Maille suffix mapping for cube names
196
+ MAILLE_SUFFIX_MAP = {
197
+ "commune": "_com",
198
+ "epci": "_epci",
199
+ "departement": "_dpt",
200
+ "region": "_reg",
201
+ }
202
+
203
+ # Dimension patterns for each geographic level (validated by API tests)
204
+ # Format: geocode_{maille} and libelle_{maille}
205
+ GEO_DIMENSION_PATTERNS = {
206
+ "region": {
207
+ "geocode": "geocode_region",
208
+ "libelle": "libelle_region",
209
+ },
210
+ "departement": {
211
+ "geocode": "geocode_departement",
212
+ "libelle": "libelle_departement",
213
+ },
214
+ "epci": {
215
+ "geocode": "geocode_epci",
216
+ "libelle": "libelle_epci",
217
+ },
218
+ "commune": {
219
+ "geocode": "geocode_commune",
220
+ "libelle": "libelle_commune",
221
+ },
222
+ }
223
+
224
+ # Region code reference
225
+ REGION_CODES = {
226
+ "11": "Île-de-France",
227
+ "24": "Centre-Val de Loire",
228
+ "27": "Bourgogne-Franche-Comté",
229
+ "28": "Normandie",
230
+ "32": "Hauts-de-France",
231
+ "44": "Grand Est",
232
+ "52": "Pays de la Loire",
233
+ "53": "Bretagne",
234
+ "75": "Nouvelle-Aquitaine",
235
+ "76": "Occitanie",
236
+ "84": "Auvergne-Rhône-Alpes",
237
+ "93": "Provence-Alpes-Côte d'Azur",
238
+ "94": "Corse",
239
+ }
src/tools.py ADDED
@@ -0,0 +1,354 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """MCP tools for querying territorial ecological indicators."""
2
+
3
+ import json
4
+ from typing import Any
5
+
6
+ from .api_client import get_client, CubeJsClient, CubeJsClientError
7
+ from .cache import get_cache, initialize_cache, refresh_cache_if_needed
8
+ from .cube_resolver import get_resolver
9
+ from .models import (
10
+ IndicatorMetadata,
11
+ SourceMetadata,
12
+ IndicatorListItem,
13
+ GEOGRAPHIC_LEVELS,
14
+ GEO_DIMENSION_PATTERNS,
15
+ )
16
+
17
+
18
+ async def _ensure_cache_initialized() -> None:
19
+ """Ensure the cache is initialized before tool execution."""
20
+ cache = get_cache()
21
+ if not cache.is_initialized:
22
+ await initialize_cache()
23
+ else:
24
+ await refresh_cache_if_needed()
25
+
26
+
27
+ async def list_indicators(
28
+ thematique: str = "",
29
+ maille: str = "",
30
+ ) -> str:
31
+ """List all available territorial ecological indicators.
32
+
33
+ Returns a list of indicators with their main characteristics. You can filter
34
+ by thematic (France Nation Verte themes like "mieux se déplacer", "mieux se loger")
35
+ or by geographic level (region, departement, epci, commune).
36
+
37
+ Args:
38
+ thematique: Optional filter by FNV thematic. Use partial match, e.g., "déplacer"
39
+ for mobility indicators, "loger" for housing, "produire" for production.
40
+ maille: Optional filter by available geographic level. Valid values:
41
+ "region", "departement", "epci", "commune".
42
+
43
+ Returns:
44
+ JSON string containing a list of indicators with id, libelle, unite,
45
+ mailles_disponibles, and thematique_fnv.
46
+
47
+ Example:
48
+ To find mobility indicators available at department level:
49
+ list_indicators(thematique="déplacer", maille="departement")
50
+ """
51
+ await _ensure_cache_initialized()
52
+ cache = get_cache()
53
+
54
+ # Normalize empty strings to None
55
+ theme_filter = thematique.strip() if thematique else None
56
+ maille_filter = maille.strip().lower() if maille else None
57
+
58
+ # Validate maille if provided
59
+ if maille_filter and maille_filter not in GEOGRAPHIC_LEVELS:
60
+ return json.dumps({
61
+ "error": f"Invalid geographic level: {maille}",
62
+ "valid_levels": GEOGRAPHIC_LEVELS,
63
+ }, ensure_ascii=False)
64
+
65
+ indicators = cache.list_indicators(
66
+ thematique=theme_filter,
67
+ maille=maille_filter,
68
+ )
69
+
70
+ return json.dumps({
71
+ "indicators": [ind.model_dump() for ind in indicators],
72
+ "count": len(indicators),
73
+ "filters_applied": {
74
+ "thematique": theme_filter,
75
+ "maille": maille_filter,
76
+ },
77
+ }, ensure_ascii=False, indent=2)
78
+
79
+
80
+ async def get_indicator_details(indicator_id: str) -> str:
81
+ """Get detailed information about a specific indicator.
82
+
83
+ Returns comprehensive metadata including description, calculation method,
84
+ data coverage, and data sources for a given indicator ID.
85
+
86
+ Args:
87
+ indicator_id: The numeric ID of the indicator (e.g., "42", "94", "611").
88
+
89
+ Returns:
90
+ JSON string containing:
91
+ - metadata: Full indicator metadata (description, methode_calcul,
92
+ annees_disponibles, completion rates by geographic level, etc.)
93
+ - sources: List of data sources with producer, license, and links.
94
+ - available_cubes: Dict mapping maille to cube name for data queries.
95
+
96
+ Example:
97
+ get_indicator_details("611") returns details about indicator 611
98
+ (Consommation d'espaces naturels, agricoles et forestiers).
99
+ """
100
+ await _ensure_cache_initialized()
101
+
102
+ # Parse indicator ID
103
+ try:
104
+ ind_id = int(indicator_id)
105
+ except ValueError:
106
+ return json.dumps({
107
+ "error": f"Invalid indicator ID: {indicator_id}. Must be a number.",
108
+ }, ensure_ascii=False)
109
+
110
+ cache = get_cache()
111
+ indicator = cache.get_indicator(ind_id)
112
+
113
+ if indicator is None:
114
+ return json.dumps({
115
+ "error": f"Indicator {ind_id} not found in metadata.",
116
+ "hint": "Use list_indicators() to see available indicators.",
117
+ }, ensure_ascii=False)
118
+
119
+ # Get available cubes from resolver
120
+ resolver = get_resolver()
121
+ available_cubes = resolver.get_cubes_for_indicator(ind_id)
122
+
123
+ # Fetch sources from API
124
+ client = get_client()
125
+ try:
126
+ sources_data = await client.load_sources_metadata(indicator_id=ind_id)
127
+ sources = [
128
+ SourceMetadata.from_api_response(row).model_dump()
129
+ for row in sources_data
130
+ ]
131
+ except CubeJsClientError as e:
132
+ sources = []
133
+ sources_error = str(e)
134
+ else:
135
+ sources_error = None
136
+
137
+ result = {
138
+ "metadata": indicator.model_dump(),
139
+ "sources": sources,
140
+ "available_cubes": available_cubes,
141
+ }
142
+
143
+ if sources_error:
144
+ result["sources_warning"] = f"Could not fetch sources: {sources_error}"
145
+
146
+ return json.dumps(result, ensure_ascii=False, indent=2)
147
+
148
+
149
+ async def query_indicator_data(
150
+ indicator_id: str,
151
+ geographic_level: str,
152
+ geographic_code: str = "",
153
+ year: str = "",
154
+ ) -> str:
155
+ """Query data values for a specific indicator and territory.
156
+
157
+ Retrieves actual data values for an indicator at the specified geographic level.
158
+ You can filter by a specific territory code and/or year.
159
+
160
+ Args:
161
+ indicator_id: The numeric ID of the indicator (e.g., "611").
162
+ geographic_level: The geographic level to query. Valid values:
163
+ "region", "departement", "epci", "commune".
164
+ geographic_code: Optional INSEE code to filter by territory:
165
+ - Region: 2 digits (e.g., "93" for PACA, "11" for Île-de-France)
166
+ - Departement: 2-3 characters (e.g., "13", "2A", "974")
167
+ - EPCI: 9 digits (SIREN code)
168
+ - Commune: 5 digits (e.g., "75056" for Paris)
169
+ year: Optional year to filter data (e.g., "2020").
170
+
171
+ Returns:
172
+ JSON string containing:
173
+ - indicator_id: The queried indicator ID
174
+ - indicator_name: Human-readable name
175
+ - geographic_level: The queried level
176
+ - data: List of data points with geocode, libelle, valeur, annee
177
+ - total_count: Number of results
178
+
179
+ Example:
180
+ Query indicator 611 (ENAF consumption) for PACA region:
181
+ query_indicator_data("611", "region", "93")
182
+
183
+ Query all departments for 2020:
184
+ query_indicator_data("611", "departement", year="2020")
185
+ """
186
+ await _ensure_cache_initialized()
187
+
188
+ # Parse indicator ID
189
+ try:
190
+ ind_id = int(indicator_id)
191
+ except ValueError:
192
+ return json.dumps({
193
+ "error": f"Invalid indicator ID: {indicator_id}. Must be a number.",
194
+ }, ensure_ascii=False)
195
+
196
+ # Validate geographic level
197
+ geo_level = geographic_level.strip().lower()
198
+ if geo_level not in GEOGRAPHIC_LEVELS:
199
+ return json.dumps({
200
+ "error": f"Invalid geographic level: {geographic_level}",
201
+ "valid_levels": GEOGRAPHIC_LEVELS,
202
+ }, ensure_ascii=False)
203
+
204
+ cache = get_cache()
205
+ resolver = get_resolver()
206
+
207
+ indicator = cache.get_indicator(ind_id)
208
+ indicator_name = indicator.libelle if indicator else f"Indicator {ind_id}"
209
+ indicator_unite = indicator.unite if indicator else None
210
+
211
+ # Find the cube for this indicator and maille
212
+ cube_name = resolver.find_cube_for_indicator(ind_id, geo_level)
213
+
214
+ if cube_name is None:
215
+ # Check if indicator exists at all
216
+ if not resolver.is_indicator_known(ind_id):
217
+ return json.dumps({
218
+ "error": f"Indicator {ind_id} not found in any data cube.",
219
+ "hint": "Use get_indicator_details() to check available mailles.",
220
+ }, ensure_ascii=False)
221
+
222
+ # Indicator exists but not at this maille
223
+ available = resolver.get_available_mailles(ind_id)
224
+ return json.dumps({
225
+ "error": f"Indicator {ind_id} is not available at {geo_level} level.",
226
+ "available_levels": available,
227
+ "hint": f"Try one of: {', '.join(available)}",
228
+ }, ensure_ascii=False)
229
+
230
+ # Build the query
231
+ geo_patterns = GEO_DIMENSION_PATTERNS[geo_level]
232
+
233
+ # Measure and dimensions with full cube prefix
234
+ measure = resolver.get_measure_name(cube_name, ind_id)
235
+ geocode_dim = resolver.get_dimension_name(cube_name, geo_patterns["geocode"])
236
+ libelle_dim = resolver.get_dimension_name(cube_name, geo_patterns["libelle"])
237
+ annee_dim = resolver.get_dimension_name(cube_name, "annee")
238
+
239
+ query: dict[str, Any] = {
240
+ "measures": [measure],
241
+ "dimensions": [libelle_dim, annee_dim],
242
+ "limit": 500,
243
+ }
244
+
245
+ # Add filters
246
+ filters = []
247
+
248
+ geo_code = geographic_code.strip() if geographic_code else None
249
+ if geo_code:
250
+ filters.append({
251
+ "member": geocode_dim,
252
+ "operator": "equals",
253
+ "values": [geo_code],
254
+ })
255
+
256
+ year_filter = year.strip() if year else None
257
+ if year_filter:
258
+ filters.append({
259
+ "member": annee_dim,
260
+ "operator": "equals",
261
+ "values": [year_filter],
262
+ })
263
+
264
+ if filters:
265
+ query["filters"] = filters
266
+
267
+ # Execute query
268
+ client = get_client()
269
+ try:
270
+ result = await client.load(query)
271
+ data_rows = result.get("data", [])
272
+ except CubeJsClientError as e:
273
+ return json.dumps({
274
+ "error": f"Query failed: {str(e)}",
275
+ "cube": cube_name,
276
+ "query": query,
277
+ }, ensure_ascii=False, indent=2)
278
+
279
+ # Parse results
280
+ data_points = []
281
+ for row in data_rows:
282
+ data_points.append({
283
+ "libelle": row.get(libelle_dim),
284
+ "annee": row.get(annee_dim),
285
+ "valeur": row.get(measure),
286
+ "unite": indicator_unite,
287
+ })
288
+
289
+ # Sort by year, then by libelle
290
+ data_points.sort(key=lambda x: (x.get("annee") or "", x.get("libelle") or ""))
291
+
292
+ return json.dumps({
293
+ "indicator_id": ind_id,
294
+ "indicator_name": indicator_name,
295
+ "geographic_level": geo_level,
296
+ "data": data_points,
297
+ "total_count": len(data_points),
298
+ "query_info": {
299
+ "cube": cube_name,
300
+ "measure": measure,
301
+ "geographic_code_filter": geo_code,
302
+ "year_filter": year_filter,
303
+ },
304
+ }, ensure_ascii=False, indent=2)
305
+
306
+
307
+ async def search_indicators(query: str) -> str:
308
+ """Search indicators by keywords in their name or description.
309
+
310
+ Performs a full-text search across indicator names (libelle) and descriptions.
311
+ All search terms must be present for an indicator to match (AND logic).
312
+
313
+ Args:
314
+ query: Search terms separated by spaces. Examples:
315
+ - "consommation espace" finds indicators about land consumption
316
+ - "émissions CO2" finds indicators about CO2 emissions
317
+ - "surface bio" finds organic surface indicators
318
+
319
+ Returns:
320
+ JSON string containing:
321
+ - indicators: List of matching indicators with id, libelle, unite,
322
+ mailles_disponibles, thematique_fnv
323
+ - query: The original search query
324
+ - total_count: Number of results
325
+
326
+ Example:
327
+ search_indicators("consommation espace") returns indicators mentioning
328
+ both "consommation" and "espace" in their name or description.
329
+ """
330
+ await _ensure_cache_initialized()
331
+ cache = get_cache()
332
+
333
+ search_query = query.strip() if query else ""
334
+
335
+ if not search_query:
336
+ # Return all indicators if no query
337
+ indicators = cache.list_indicators()
338
+ else:
339
+ indicators = cache.search_indicators(search_query)
340
+
341
+ return json.dumps({
342
+ "indicators": [ind.model_dump() for ind in indicators],
343
+ "query": search_query,
344
+ "total_count": len(indicators),
345
+ }, ensure_ascii=False, indent=2)
346
+
347
+
348
+ # Export all tools
349
+ __all__ = [
350
+ "list_indicators",
351
+ "get_indicator_details",
352
+ "query_indicator_data",
353
+ "search_indicators",
354
+ ]