| | --- |
| | license: apache-2.0 |
| | datasets: |
| | - ShadiAbpeikar/HandGesture2Robot |
| | language: |
| | - en |
| | base_model: |
| | - google/siglip2-base-patch16-224 |
| | pipeline_tag: image-classification |
| | library_name: transformers |
| | tags: |
| | - Robot |
| | - Hand-Gesture |
| | - SigLIP2 |
| | - code |
| | - Sign |
| | --- |
| | |
| |  |
| |
|
| | # **Hand-Gesture-2-Robot** |
| |
|
| | > **Hand-Gesture-2-Robot** is an image classification vision-language encoder model fine-tuned from **google/siglip2-base-patch16-224** for a single-label classification task. It is designed to recognize hand gestures and map them to specific robot commands using the **SiglipForImageClassification** architecture. |
| |
|
| | ```py |
| | Classification Report: |
| | precision recall f1-score support |
| | |
| | "rotate anticlockwise" 0.9926 0.9958 0.9942 944 |
| | "increase" 0.9975 0.9975 0.9975 789 |
| | "release" 0.9941 1.0000 0.9970 670 |
| | "switch" 1.0000 0.9986 0.9993 728 |
| | "look up" 0.9984 0.9984 0.9984 635 |
| | "Terminate" 0.9983 1.0000 0.9991 580 |
| | "decrease" 0.9942 1.0000 0.9971 684 |
| | "move backward" 0.9986 0.9972 0.9979 725 |
| | "point" 0.9965 0.9913 0.9939 1716 |
| | "rotate clockwise" 1.0000 1.0000 1.0000 868 |
| | "grasp" 0.9922 0.9961 0.9941 767 |
| | "pause" 0.9991 1.0000 0.9995 1079 |
| | "move forward" 1.0000 0.9944 0.9972 886 |
| | "Confirm" 0.9983 0.9983 0.9983 573 |
| | "look down" 0.9985 0.9970 0.9977 664 |
| | "move left" 0.9952 0.9968 0.9960 622 |
| | "move right" 1.0000 1.0000 1.0000 622 |
| | |
| | accuracy 0.9972 13552 |
| | macro avg 0.9973 0.9977 0.9975 13552 |
| | weighted avg 0.9972 0.9972 0.9972 13552 |
| | ``` |
| |
|
| |  |
| |
|
| | The model categorizes hand gestures into 17 different robot commands: |
| | - **Class 0:** "rotate anticlockwise" |
| | - **Class 1:** "increase" |
| | - **Class 2:** "release" |
| | - **Class 3:** "switch" |
| | - **Class 4:** "look up" |
| | - **Class 5:** "Terminate" |
| | - **Class 6:** "decrease" |
| | - **Class 7:** "move backward" |
| | - **Class 8:** "point" |
| | - **Class 9:** "rotate clockwise" |
| | - **Class 10:** "grasp" |
| | - **Class 11:** "pause" |
| | - **Class 12:** "move forward" |
| | - **Class 13:** "Confirm" |
| | - **Class 14:** "look down" |
| | - **Class 15:** "move left" |
| | - **Class 16:** "move right" |
| |
|
| | # **Run with Transformers🤗** |
| |
|
| | ```python |
| | !pip install -q transformers torch pillow gradio |
| | ``` |
| |
|
| | ```python |
| | import gradio as gr |
| | from transformers import AutoImageProcessor |
| | from transformers import SiglipForImageClassification |
| | from transformers.image_utils import load_image |
| | from PIL import Image |
| | import torch |
| | |
| | # Load model and processor |
| | model_name = "prithivMLmods/Hand-Gesture-2-Robot" |
| | model = SiglipForImageClassification.from_pretrained(model_name) |
| | processor = AutoImageProcessor.from_pretrained(model_name) |
| | |
| | def gesture_classification(image): |
| | """Predicts the robot command from a hand gesture image.""" |
| | image = Image.fromarray(image).convert("RGB") |
| | inputs = processor(images=image, return_tensors="pt") |
| | |
| | with torch.no_grad(): |
| | outputs = model(**inputs) |
| | logits = outputs.logits |
| | probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist() |
| | |
| | labels = { |
| | "0": "rotate anticlockwise", |
| | "1": "increase", |
| | "2": "release", |
| | "3": "switch", |
| | "4": "look up", |
| | "5": "Terminate", |
| | "6": "decrease", |
| | "7": "move backward", |
| | "8": "point", |
| | "9": "rotate clockwise", |
| | "10": "grasp", |
| | "11": "pause", |
| | "12": "move forward", |
| | "13": "Confirm", |
| | "14": "look down", |
| | "15": "move left", |
| | "16": "move right" |
| | } |
| | predictions = {labels[str(i)]: round(probs[i], 3) for i in range(len(probs))} |
| | |
| | return predictions |
| | |
| | # Create Gradio interface |
| | iface = gr.Interface( |
| | fn=gesture_classification, |
| | inputs=gr.Image(type="numpy"), |
| | outputs=gr.Label(label="Prediction Scores"), |
| | title="Hand Gesture to Robot Command", |
| | description="Upload an image of a hand gesture to predict the corresponding robot command." |
| | ) |
| | |
| | # Launch the app |
| | if __name__ == "__main__": |
| | iface.launch() |
| | ``` |
| |
|
| | # **Intended Use:** |
| |
|
| | The **Hand-Gesture-2-Robot** model is designed to classify hand gestures into corresponding robot commands. Potential use cases include: |
| |
|
| | - **Human-Robot Interaction:** Enabling intuitive control of robots using hand gestures. |
| | - **Assistive Technology:** Helping individuals with disabilities communicate commands. |
| | - **Industrial Automation:** Enhancing robotic operations in manufacturing. |
| | - **Gaming & VR:** Providing gesture-based controls for immersive experiences. |
| | - **Security & Surveillance:** Implementing gesture-based access control. |