A newer version of the Gradio SDK is available:
6.6.0
title: Gemma-2 Multimodal Chat
emoji: ๐
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
๐ Gemma-2 Multimodal Chat Application
A sophisticated Gradio-based chat application featuring multimodal capabilities with Google's Gemma-2 model.
โจ Features
- ๐ฌ Interactive Chat Interface: Persistent conversation history with context awareness
- ๐ผ๏ธ Vision Capabilities: Upload and analyze images with AI-powered insights
- ๐ File Processing: Support for PDF and TXT file uploads with text extraction
- ๐ง Contextual Responses: Maintains conversation context for follow-up questions
- ๐จ Modern UI: Clean, responsive interface built with Gradio
- ๐ State Management: Persistent chat history and file context across interactions
๐ ๏ธ Technologies Used
- Frontend: Gradio 4.0+
- AI Model: Google's Gemma-2-2B-IT
- File Processing: PyPDF2 for PDFs, PIL for images
- Backend: Python with Hugging Face Transformers
- Deployment: Hugging Face Spaces
๐ Quick Start
Local Development
Clone the repository:
git clone <repository-url> cd gemmaInstall dependencies:
pip install -r requirements.txtRun the application:
python app.pyOpen your browser and navigate to
http://localhost:7860
Hugging Face Spaces Deployment
- Create a new Space on Hugging Face Spaces
- Choose "Gradio" as the SDK
- Upload the files from this repository
- The app will automatically deploy and be accessible via your Space URL
๐ How to Use
Basic Chat
- Type your message in the text input box
- Click "Submit" or press Enter
- View the AI response in the chat history
Image Analysis
- Upload an image using the image upload component
- Type a question about the image (e.g., "What do you see in this image?")
- Submit to get AI-powered image analysis
File Processing
- Upload a PDF or TXT file using the file upload component
- Ask questions about the file content
- The extracted text will be used as context for responses
Advanced Features
- Persistent Context: Previous conversations are remembered
- File Context: Uploaded file content persists for follow-up questions
- Clear Chat: Reset conversation history and uploaded files
๐ง Configuration
Model Configuration
The application uses Google's Gemma-2-2B-IT model from Hugging Face. The model is loaded and used for inference in the gemma_3_inference function in app.py.
Customization
- Modify the UI theme in the
gr.Blocksconfiguration - Adjust file size limits and supported formats
- Customize the chat history display format
- Add additional file processing capabilities
๐ Project Structure
gemma/
โโโ .gitattributes # Git configuration
โโโ .gitignore # Git ignore file
โโโ .huggingface/ # Hugging Face configuration
โ โโโ CODEOWNERS # Space ownership configuration
โโโ app.py # Main Gradio application
โโโ app_config.yaml # Hugging Face Space configuration
โโโ HUGGINGFACE_DEPLOYMENT.md # Deployment instructions
โโโ push_to_huggingface.bat # Windows deployment script
โโโ push_to_huggingface.py # Python deployment script
โโโ README.md # Project documentation (with Space config)
โโโ README.space.md # Hugging Face Space README
โโโ requirements.txt # Python dependencies
๐ฎ Future Enhancements
- Upgrade to Gemma-3 model when available
- Support for additional file formats (DOCX, XLSX)
- Advanced image processing capabilities
- User authentication and personalized chat history
- Export chat conversations
- Multi-language support
- Voice input/output capabilities
๐ค Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- Google for the Gemma model family
- Hugging Face for the amazing ecosystem and Spaces platform
- Gradio team for the intuitive UI framework
๐ Support
If you encounter any issues or have questions, please open an issue on the repository or contact the maintainers.
Note: This application uses Google's Gemma-2-2B-IT model. The model doesn't have native vision capabilities, but the application is designed to handle image uploads with appropriate messaging.