File size: 6,180 Bytes
e213096
 
 
813f1fd
3f2a3b2
 
 
 
 
 
 
 
 
 
e213096
 
 
 
 
 
 
3f2a3b2
 
b1854ba
 
e213096
3f2a3b2
 
 
 
 
 
 
 
 
 
e213096
 
3f2a3b2
e213096
 
 
3f2a3b2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e213096
 
 
3f2a3b2
e213096
 
 
 
 
 
 
 
 
 
 
 
b1854ba
e213096
 
 
 
b1854ba
3f2a3b2
 
 
 
 
 
 
 
b1854ba
e213096
 
 
 
 
 
 
295e89b
 
3f2a3b2
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
---
base_model: facebook/detr-resnet-101-dc5
datasets:
- Voxel51/fisheye8k
library_name: transformers
license: mit
pipeline_tag: object-detection
tags:
- generated_from_trainer
- object-detection
- detr
- computer-vision
- its
- autonomous-driving
model-index:
- name: fisheye8k_facebook_detr-resnet-101-dc5
  results: []
---

# fisheye8k_facebook_detr-resnet-101-dc5

This model is a fine-tuned version of [facebook/detr-resnet-101-dc5](https://huggingface.co/facebook/detr-resnet-101-dc5) on the [Fisheye8K dataset](https://huggingface.co/datasets/Voxel51/fisheye8k). It is developed as part of the **Mcity Data Engine** initiative.

It achieves the following results on the evaluation set:
- Loss: 2.6740

## Paper
This model was presented in the paper:
[Mcity Data Engine: Iterative Model Improvement Through Open-Vocabulary Data Selection](https://arxiv.org/abs/2504.21614).

## Project Page
For more information about the **Mcity Data Engine**, visit the official project page: [Mcity Data Engine Docs](https://mcity.github.io/mcity_data_engine/).

## Code
The code for the **Mcity Data Engine** is publicly available on GitHub: [mcity/mcity_data_engine](https://github.com/mcity/mcity_data_engine).

## Model description

The `fisheye8k_facebook_detr-resnet-101-dc5` model is an object detection model fine-tuned for Intelligent Transportation Systems (ITS) using the DETR architecture with a ResNet-101 backbone. It is a core component of the Mcity Data Engine, an open-source system designed to address the challenges of selecting and labeling appropriate data for machine learning models, particularly for detecting long-tail and novel classes of interest in large amounts of unlabeled data from vehicle fleets and roadside perception systems. This model specifically demonstrates iterative model improvement through an open-vocabulary data selection process within this framework.

## Intended uses & limitations

**Intended Uses:**
This model is intended for research and development in the field of Intelligent Transportation Systems (ITS), specifically for object detection tasks. It is designed to identify various objects (e.g., Bus, Bike, Car, Pedestrian, Truck as per `id2label` mapping) in data collected from automotive fisheye cameras. It can be used as a foundation for developing AI algorithms that require robust object grounding and for exploring iterative model improvement techniques focusing on rare and novel classes.

**Limitations:**
*   The model's performance is primarily validated on the Fisheye8K dataset and may vary when applied to other datasets or real-world scenarios with different camera types, environments, or object distributions.
*   While the underlying research focuses on open-vocabulary detection and long-tail classes, generalization to entirely unseen object categories or extremely rare instances might still require further data selection and retraining within the Mcity Data Engine framework.
*   The model provides bounding box predictions and class labels but does not offer instance segmentation or other more granular visual understanding capabilities.

## Sample Usage

You can use this model with the Hugging Face `transformers` library for object detection:

```python
import torch
from transformers import AutoImageProcessor, AutoModelForObjectDetection
from PIL import Image
import requests

# Load image processor and model
image_processor = AutoImageProcessor.from_pretrained("mcity-data-engine/fisheye8k_facebook_detr-resnet-101-dc5")
model = AutoModelForObjectDetection.from_pretrained("mcity-data-engine/fisheye8k_facebook_detr-resnet-101-dc5")

# Example image (replace with your image path or URL)
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")

# Process image and get model outputs
inputs = image_processor(images=image, return_tensors="pt")
outputs = model(**inputs)

# Post-process outputs to get detected objects
target_sizes = torch.tensor([image.size[::-1]])
results = image_processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.9)[0]

print("Detected objects:")
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
    box = [round(i, 2) for i in box.tolist()]
    print(
        f"  - {model.config.id2label[label.item()]} with confidence "
        f"{round(score.item(), 3)} at location {box}"
    )
```

## Training and evaluation data

This model was fine-tuned on the [Fisheye8K dataset](https://huggingface.co/datasets/Voxel51/fisheye8k). This dataset is specifically designed for object detection in images captured by fisheye cameras, making it highly relevant for applications in intelligent transportation systems.

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 0
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- num_epochs: 36
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step  | Validation Loss |
|:-------------:|:-----:|:-----:|:---------------:|\
| 2.1508        | 1.0   | 5288  | 2.4721          |\
| 1.7423        | 2.0   | 10576 | 2.3029          |\
| 1.5881        | 3.0   | 15864 | 2.2454          |\
| 1.5641        | 4.0   | 21152 | 2.2912          |\
| 1.4438        | 5.0   | 26440 | 2.2912          |\
| 1.4503        | 6.0   | 31728 | 2.5056          |\
| 1.3487        | 7.0   | 37016 | 2.5812          |\
| 1.2777        | 8.0   | 42304 | 2.6740          |


### Framework versions

- Transformers 4.48.3
- Pytorch 2.5.1+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0

## Citation

If you use the Mcity Data Engine or this model in your research, feel free to cite the project:

```bibtex
@article{bogdoll2025mcitydataengine,
  title={Mcity Data Engine},
  author={Bogdoll, Daniel and Anata, Rajanikant Patnaik and Stevens, Gregory},
  journal={GitHub. Note: https://github.com/mcity/mcity_data_engine},
  year={2025}
}
```