Spaces:
Sleeping
Sleeping
| title: Words2csv | |
| emoji: 👁 | |
| colorFrom: purple | |
| colorTo: gray | |
| sdk: docker | |
| pinned: false | |
| # words2csv | |
| Upload a PDF or image and convert the detected content into CSV using one of the supported backends: | |
| - **OpenAI** (requires `OPENAI_API_KEY`) | |
| - **Gemini** (requires `GEMINI_API_KEY`) | |
| - **olmOCR via Hugging Face Inference Endpoint** (requires `HF_TOKEN` and a configured HF Endpoint URL) | |
| The app runs as a Gradio UI (`app.py`). | |
| ## Research | |
| The result of the research you can find [here](https://www.linkedin.com/pulse/state-art-handwriting-ocr-what-means-everyday-users-max-nedelchev-f0lxf) | |
| ## Prerequisites | |
| - Python 3.11+ | |
| - One or more of the following credentials (depending on the model you pick in the UI): | |
| - `OPENAI_API_KEY` | |
| - `GEMINI_API_KEY` | |
| - `HF_TOKEN` (Hugging Face access token with permission to call your Inference Endpoint) | |
| ## Quickstart (local) | |
| 1. Create a virtualenv and install deps | |
| ```bash | |
| python -m venv .venv | |
| . .venv/bin/activate | |
| pip install -r requirements.txt | |
| ``` | |
| 2. Create a `.env` file (or export env vars in your shell) | |
| ```bash | |
| OPENAI_API_KEY=... | |
| GEMINI_API_KEY=... | |
| HF_TOKEN=... | |
| ``` | |
| 3. Run the app | |
| ```bash | |
| python app.py | |
| ``` | |
| Then open `http://localhost:7860`. | |
| ## Quickstart (Docker) | |
| If you prefer Docker locally: | |
| 1. Create `.env` with the keys you need (same as above) | |
| 2. Run: | |
| ```bash | |
| docker compose up --build | |
| ``` | |
| Open `http://localhost:7860`. | |
| ## Run on your own Hugging Face account (Spaces) | |
| This repo is configured for **Hugging Face Spaces** using the **Docker** SDK (see the YAML frontmatter at the top of this README and the `Dockerfile`). | |
| ### 1) Create the Space | |
| 1. Go to `https://huggingface.co/spaces` | |
| 2. Click **Create new Space** | |
| 3. Choose: | |
| - **Space SDK**: `Docker` | |
| - **Visibility**: your choice (private is recommended if you use paid API keys) | |
| 4. Create the Space | |
| Tip: in the **Create new Space** flow, you can also use the **Clone repository** option and paste your repo URL to import this project directly. | |
| ### 2) Push this repository to the Space | |
| Clone the Space repo and add this project files (or push from your existing git remote). In general, a Space is just a Git repository. | |
| ### 3) Add secrets (required) | |
| In your Space, go to: | |
| `Settings` -> `Variables and secrets` | |
| Add: | |
| - **`OPENAI_API_KEY`**: required if you select an OpenAI model in the UI | |
| - **`GEMINI_API_KEY`**: required if you select a Gemini model in the UI | |
| - **`HF_TOKEN`**: required if you select the `olmOCR-2-7B-1025-FP8` backend (Inference Endpoint) | |
| Notes: | |
| - Secrets are injected as environment variables at runtime. | |
| - If you don’t set a key, selecting that backend will raise a runtime error. | |
| ### 4) Wait for the Space build and start | |
| Once the Space finishes building, open it and upload a PDF/image. | |
| ## Hugging Face Inference Endpoint (olmOCR backend) | |
| The `olmOCR` backend does **not** call Google/OpenAI. It calls a **Hugging Face Inference Endpoint** using `huggingface_hub.InferenceClient`. | |
| Important details: | |
| - The endpoint URL is currently hardcoded as `HF_ENDPOINT_URL` in `olm_ocr.py`. | |
| - To run this in your own HF account, you typically need your **own** endpoint URL. | |
| ### Create your own endpoint | |
| 1. Go to `https://huggingface.co/inference-endpoints` | |
| 2. Create an endpoint for the model: | |
| - `allenai/olmOCR-2-7B-1025-FP8` | |
| 3. Wait until the endpoint status is **Running** | |
| 4. Copy the endpoint URL (it looks like `https://<id>.<region>.<provider>.endpoints.huggingface.cloud`) | |
| ### Configure this repo to use your endpoint | |
| Update `HF_ENDPOINT_URL` in `olm_ocr.py` to your endpoint URL. | |
| The call is authenticated via `HF_TOKEN`: | |
| - Create a token at `https://huggingface.co/settings/tokens` | |
| - Make sure it can access your endpoint (and the endpoint is in the same account/org) | |
| ## About `GEMINI_API_KEY` and Hugging Face | |
| `GEMINI_API_KEY` is used by the **Gemini backend** (`gemini_backend.py`) via the `google-genai` client. | |
| If you are running this app on Hugging Face Spaces: | |
| - You still provide `GEMINI_API_KEY` as a **Space secret**. | |
| - You do **not** get a Gemini key from Hugging Face. | |
| If you specifically want Gemini calls to go through a Hugging Face-hosted endpoint, that would require a different integration than the current code (e.g. calling an HF Inference Endpoint hosting a Gemini-compatible service). The current implementation calls Google directly using your `GEMINI_API_KEY`. | |
| ## Troubleshooting | |
| - **`OPENAI_API_KEY environment variable is not set.`** | |
| - Add `OPENAI_API_KEY` as env var (local `.env`) or Space secret. | |
| - **`GEMINI_API_KEY environment variable is not set.`** | |
| - Add `GEMINI_API_KEY` as env var (local `.env`) or Space secret. | |
| - **olmOCR endpoint errors (401/403)** | |
| - Ensure `HF_TOKEN` is set and has permission to call the endpoint. | |
| - Ensure `HF_ENDPOINT_URL` points to an endpoint you own / can access. | |
| - **Space builds but doesn’t start / crashes** | |
| - Check Space logs. | |
| - Make sure you’re not missing required secrets for the backend you select. | |