You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

🌐 Looking for the English version? Scroll down 👇

Lyra-Mistral7B-immobilier-LoRA 🏠💶

Description

Ce modèle est une adaptation LoRA du Mistral-7B-Instruct-v0.3 spécialisée sur des données synthétiques d’estimation immobilière.
Objectif : proposer en français une estimation de prix (fourchette en €) assortie d’un commentaire concis (1–2 phrases), à partir de paramètres simples : surface, type de bien, état et zone (A, B1, B2 ou C).

Présentation du pipeline

Ce projet est construit en deux parties complémentaires, qui illustrent la chaîne complète de traitement d’un cas réel de données immobilières :

Partie 1 – Préparation du dataset

Point de départ : un fichier Excel client, volontairement bruité (valeurs manquantes, doublons, formats hétérogènes).
Nettoyage et normalisation des données avec pandas :
- conversion des prix en numérique,
- gestion des valeurs manquantes,
- suppression des doublons,
- contrôle qualité statistique.
Export en CSV UTF-8 puis conversion en JSONL au format attendu par Mistral (structure messages).
Résultat : deux jeux de données (train et validation) propres, exploitables directement pour le fine-tuning.

Partie 2 – Fine-tuning LoRA Mistral-7B

Détails techniques

Base model: mistralai/Mistral-7B-Instruct-v0.3
Technique: QLoRA (4-bit, bitsandbytes)
Modules LoRA: q_proj, k_proj, v_proj, o_proj, down_proj
Epochs: 3
GPU: A100 (Colab Pro)

Résultats observés

Le modèle apprend à :

répondre en français avec un format clair : Estimation + commentaire,
respecter les zones fournies (A, B1, B2, C),
éviter les hallucinations de villes/régions.
Comparé au modèle de base, il est beaucoup plus cadré et concis.

Utilisation

!pip install -q transformers accelerate bitsandbytes sentencepiece huggingface_hub

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "jeromex1/Lyra-Mistral7B-immobilier-LoRA"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
)

system_prompt = (
    "Tu es un expert en estimation immobilière. "
    "Réponds en français, de manière concise, en 1 à 2 phrases maximum. "
    "Donne une estimation chiffrée (fourchette en euros) et un bref commentaire. "
    "Les localisations sont classées par zone : A, B1, B2 ou C. "
    "NE MENTIONNE PAS de villes ou régions, uniquement la zone donnée."
)

user_prompt = "Surface : 96 m²\nType : maison\nÉtat : bon\nZone : B1"
full_prompt = f"{system_prompt}\n\n{user_prompt}\n\nRéponse :"

inputs = tokenizer(full_prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=80)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Prompt (input)	Réponse générée
Surface : 96 m² Type : maison État : bon Zone : B1	Valeur estimée entre 300 000 € et 400 000 €. Bien situé dans une zone B1, cette maison de 96 m² est en très bon état.
Surface : 55 m² Type : appartement État : à rénover Zone : C	Estimation : 100 000 – 120 000 € Commentaire : Appartement à rénover dans une zone C, l'estimation est comprise entre 100 000 et 120 000 €.
Surface : 120 m² Type : maison État : excellent Zone : A	Valeur estimée entre 500 000 € et 600 000 €. Situation privilégiée, grande surface et état excellent.
Surface : 80 m² Type : appartement État : moyen Zone : B2	Estimation : 250 000 – 300 000 € Commentaire : Bien situé dans une zone B2, cet appartement de 80 m² est en bon état de conservation.
Surface : 150 m² Type : maison État : ancien Zone : C	Valeur estimée entre 200 000 € et 300 000 € pour une maison ancienne de 150 m² dans la zone C.

Recommandation

Nous conseillons vivement d’utiliser Google Colab (gratuit ou Colab Pro) et d’y sélectionner un GPU (T4 si gratuit, A100 si Colab Pro).

📘 Découvrez mes 40 projets IA et sciences STEM ici : 👉 github.com/Jerome-openclassroom

📘 Pour en savoir plus, rendez-vous sur ma page GitHub consacrée au projet : 👉 Lyra-Mistral7B-Estate

🌍 English Version

Description

This model is a LoRA adaptation of Mistral-7B-Instruct-v0.3, specialized in synthetic real-estate estimation data. Goal: provide concise French responses with a price range (in €) and a short comment based on simple parameters: surface, property type, condition, and zone (A, B1, B2, C).

Pipeline Overview

This project is built in two complementary parts, illustrating the full workflow of a real-world real estate dataset:

Part 1 – Dataset Preparation

Starting point: a raw Excel file, intentionally noisy (missing values, duplicates, heterogeneous formats).
Data cleaning and normalization with pandas:
- conversion of prices into numeric format,
- handling of missing values,
- removal of duplicates,
- statistical quality checks.
Export to UTF-8 CSV and conversion to JSONL in the format required by Mistral (messages structure).
Result: two clean datasets (train and validation), directly usable for fine-tuning.

Part 2 – Fine-tuning LoRA Mistral-7B

Technical Details

Base model: mistralai/Mistral-7B-Instruct-v0.3
Technique: QLoRA (4-bit, bitsandbytes)
LoRA modules: q_proj, k_proj, v_proj, o_proj, down_proj
Epochs: 3
GPU: A100 (Colab Pro)

Observed Results

The model learns to:

answer in French with a clear “Estimation + comment” format,
respect the given zone (A, B1, B2, C),
avoid hallucinating real cities/regions. Compared to the base model, outputs are much more concise and on-topic.

!pip install -q transformers accelerate bitsandbytes sentencepiece huggingface_hub

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "jeromex1/Lyra-Mistral7B-immobilier-LoRA"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
)

system_prompt = (
    "You are a real-estate estimation expert. "
    "Answer strictly in French, concisely, in 1–2 sentences. "
    "Always provide a price range in euros and a short comment. "
    "Localisations are zones A, B1, B2 or C. "
    "NEVER mention cities or regions, only the given zone."
)

user_prompt = "Surface : 96 m²\nType : maison\nÉtat : bon\nZone : B1"
full_prompt = f"{system_prompt}\n\n{user_prompt}\n\nRéponse :"

inputs = tokenizer(full_prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=80)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

📘 Discover my 40 AI & STEM projects here: 👉 github.com/Jerome-openclassroom

📘 To know more about this project : 👉 Lyra-Mistral7B-Estate

Downloads last month: 18

Safetensors

Model size

4B params

Tensor type

F32

BF16

Model tree for jeromex1/Lyra-Mistral7B-immobilier-LoRA

Base model

mistralai/Mistral-7B-v0.3

Finetuned

mistralai/Mistral-7B-Instruct-v0.3

Adapter

(507)

this model