---
metrics:
- pAUC
model-index:
- name: >-
    avanishd/avanishd/vit-base-patch16-dinov3-finetuned-skin-lesion-classification
  results:
  - task:
      name: Image Classification
      type: image-classification
    metrics:
    - name: pAUC
      type: pAUC
      value: 0.1441070826953209
base_model:
- timm/vit_base_patch16_dinov3.lvd1689m
---

# vit-base-patch16-dinov3-finetuned-skin-lesion-classification

This model is a finetuned for skin lesion classification.

Params (M): 85.6

## Intended Uses & Limitations

### Intended Use

This model is intended for dermoscopic skin lesion classification using a 224x224 image size.

### Limitations

This model was only trained for 1 epoch and has not seen many malignant examples (due to large class imbalance in ISIC 2024 dataset).

## How to Get Started with the Model

```Python
class DinoSkinLesionClassifier(nn.Module, PyTorchModelHubMixin):
  """
  PytorchModelHubMixin adds push to Hugging Face Hub

  See: https://huggingface.co/docs/hub/models-uploading#upload-a-pytorch-model-using-huggingfacehub
  """
  def __init__(self, num_classes=1, freeze_backbone=True):
    super(DinoSkinLesionClassifier, self).__init__()

    # Initialize Dino v3 backbone
    self.backbone = timm.create_model('vit_base_patch16_dinov3', pretrained=True, num_classes=0, global_pool='avg')

    # Freeze backbone weights if requested
    # This makes training much faster
    if freeze_backbone:
      for param in self.backbone.parameters():
        param.requires_grad = False

    # Get feature dimension from the backbone
    feat_dim = self.backbone.num_features

    # Define the classification head
    self.head = nn.Linear(feat_dim, num_classes) # Should be 768 in, 1 out

  def forward(self, x):
    out = self.backbone(x)
    out = self.head(out)

    return out

from huggingface_hub import hf_hub_download

weights_path = hf_hub_download(
        repo_id="avanishd/vit-base-patch16-dinov3-finetuned-skin-lesion-classification",
        filename="model.safetensors"
    )

from safetensors.torch import load_model

model = EfficientNetSkinLesionClassifier()
load_model(model, filename=weights_path, strict=True)

model.to(device) # Don't forget to put on GPU

model.eval()  # Set model to evaluation mode


# Example with PH2 Dataset

class PH2Dataset(Dataset):
  """
  Dataset for PH2 images, which are in png format.

  PH2 contains skin lesions images classified as

  - Common Nevus (benign)
  - Atypical Nevus (benign)
  - Melanoma (malignant)

  No need for is real label here, since this is purely for testing
  """

  def __init__(self, dir_path, metadata, transform=None):
    super(PH2Dataset, self).__init__()

    self.dir_path = dir_path
    self.transform = transform

    self.image_files = [os.path.join(dir_path, f) for f in os.listdir(dir_path)
                        if f.lower().endswith(('.jpg', '.jpeg', '.png'))]

    # Load metadata w/ polars (only 2 columns)
    self.metadata = pl.read_csv(metadata)

    self.diagnostic_mapping = {
        "Common Nevus": 0,
        "Atypical Nevus": 0,
        "Melanoma": 1,
    }

  def __len__(self):
    return len(self.image_files)

  def __getitem__(self, idx):
    # The image name in the metadata csv are like IMD003
    image_id = self.image_files[idx].split('/')[-1].split('.')[0]

    # Still need the entire path to open the image
    image = Image.open(self.image_files[idx]).convert('RGB')

    if self.transform: # Apply transform if it exists
      image = self.transform(image)

    diagnosis = self.metadata.filter(pl.col("image_name") == image_id).select("diagnosis").item()

    label = torch.tensor(self.diagnostic_mapping[diagnosis], dtype=torch.int16)

    return image, label


transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), # Image net mean and std
    transforms.Resize((224, 224)),  # Dimensions for Efficient Net v2
])

ph_2_images = "/content/data/ph2_data/images"
ph_2_metadata = "/content/data/ph2_data/ph_2_dataset.csv"

ex_dataset = PH2Dataset(ph_2_images, ph_2_metadata, transform)

ex_loader = DataLoader(ex_dataset, batch_size=64, shuffle=False)

for (images, labels) in test_loader:
  images = images.to(device)
  labels = labels.to(device)
  output = model(images)

  y_pred_prob = torch.sigmoid(output).cpu().numpy().ravel()
  y_pred = np.where(y_pred_prob < 0.5, 0, 1)

  return y_pred

```

## Training and evaluation data

This model was trained with the [ISIC 2024 challenge](https://www.kaggle.com/competitions/isic-2024-challenge) and [ISIC 2024 synthetic](https://www.kaggle.com/datasets/ilya9711nov/isic-2024-synthetic) datasets.

For the ISIC 2024 Challenge data, an 80-20 train test split was applied, and the test split was used to evaluate the model.

## Training Procedure

### Training hyperparameters
- learning_rate: 1e-4
- train_batch_size: 64
- eval_batch_size: 64
- optimizer: Use OptimizerNames.ADAMW_TORCH with weight decay=1e-2 and optimizer_args=No additional optimizer arguments
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step |
|---------------|-------|------|
| 0.5027 | 1 | 100 |
| 0.5672 | 1 | 200 |
| 0.5373 | 1 | 300 |
| 0.4693 | 1 | 400 |
| 5.3829 | 1 | 500 |
| 0.4872 | 1 | 600 |
| 0.4717 | 1 | 700 |
| 0.4550 | 1 | 800 |
| 0.4185 | 1 | 900 |
| 0.4142 | 1 | 1000 |
| 0.3570 | 1 | 1100 |
| 0.3877 | 1 | 1200 |
| 0.4282 | 1 | 1300 |
| 8.8676 | 1 | 1400 |
| 0.3732 | 1 | 1500 |
| 0.3522 | 1 | 1600 |
| 0.3065 | 1 | 1700 |
| 0.3732 | 1 | 1800 |
| 0.3965 | 1 | 1900 |
| 0.4727 | 1 | 2000 |
| 0.3407 | 1 | 2100 |
| 0.3421 | 1 | 2200 |
| 0.3847 | 1 | 2300 |
| 0.3911 | 1 | 2400 |
| 0.4006 | 1 | 2500 |
| 0.2836 | 1 | 2600 |
| 0.3968 | 1 | 2700 |
| 0.3796 | 1 | 2800 |
| 0.3317 | 1 | 2900 |
| 0.2762 | 1 | 3000 |
| 0.3027 | 1 | 3100 |
| 0.3002 | 1 | 3200 |
| 0.3672 | 1 | 3300 |
| 0.2660 | 1 | 3400 |
| 0.3145 | 1 | 3500 |
| 0.4098 | 1 | 3600 |
| 0.3156 | 1 | 3700 |
| 0.2762 | 1 | 3800 |
| 0.2557 | 1 | 3900 |
| 0.3204 | 1 | 4000 |
| 0.3097 | 1 | 4100 |
| 0.2790 | 1 | 4200 |
| 0.3395 | 1 | 4300 |
| 0.2888 | 1 | 4400 |
| 0.3002 | 1 | 4500 |
| 0.3388 | 1 | 4600 |
| 0.3744 | 1 | 4700 |
| 0.3143 | 1 | 4800 |
| 0.3501 | 1 | 4900 |
| 0.2923 | 1 | 5000 |
| 0.3152 | 1 | 5100 |
| 0.3380 | 1 | 5200 |


### Framework versions

- Pytorch 2.9.0+cu126
- torchvision: 0.24.0+cu126
- timm: 1.0.22
- numpy: 2.0.2
- safetensors: 0.7.0