words2csv / README.md
snake11235's picture
Update README.md
f2d4539 verified
---
title: Words2csv
emoji: 👁
colorFrom: purple
colorTo: gray
sdk: docker
pinned: false
---
# words2csv
Upload a PDF or image and convert the detected content into CSV using one of the supported backends:
- **OpenAI** (requires `OPENAI_API_KEY`)
- **Gemini** (requires `GEMINI_API_KEY`)
- **olmOCR via Hugging Face Inference Endpoint** (requires `HF_TOKEN` and a configured HF Endpoint URL)
The app runs as a Gradio UI (`app.py`).
## Research
The result of the research you can find [here](https://www.linkedin.com/pulse/state-art-handwriting-ocr-what-means-everyday-users-max-nedelchev-f0lxf)
## Prerequisites
- Python 3.11+
- One or more of the following credentials (depending on the model you pick in the UI):
- `OPENAI_API_KEY`
- `GEMINI_API_KEY`
- `HF_TOKEN` (Hugging Face access token with permission to call your Inference Endpoint)
## Quickstart (local)
1. Create a virtualenv and install deps
```bash
python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
```
2. Create a `.env` file (or export env vars in your shell)
```bash
OPENAI_API_KEY=...
GEMINI_API_KEY=...
HF_TOKEN=...
```
3. Run the app
```bash
python app.py
```
Then open `http://localhost:7860`.
## Quickstart (Docker)
If you prefer Docker locally:
1. Create `.env` with the keys you need (same as above)
2. Run:
```bash
docker compose up --build
```
Open `http://localhost:7860`.
## Run on your own Hugging Face account (Spaces)
This repo is configured for **Hugging Face Spaces** using the **Docker** SDK (see the YAML frontmatter at the top of this README and the `Dockerfile`).
### 1) Create the Space
1. Go to `https://huggingface.co/spaces`
2. Click **Create new Space**
3. Choose:
- **Space SDK**: `Docker`
- **Visibility**: your choice (private is recommended if you use paid API keys)
4. Create the Space
Tip: in the **Create new Space** flow, you can also use the **Clone repository** option and paste your repo URL to import this project directly.
### 2) Push this repository to the Space
Clone the Space repo and add this project files (or push from your existing git remote). In general, a Space is just a Git repository.
### 3) Add secrets (required)
In your Space, go to:
`Settings` -> `Variables and secrets`
Add:
- **`OPENAI_API_KEY`**: required if you select an OpenAI model in the UI
- **`GEMINI_API_KEY`**: required if you select a Gemini model in the UI
- **`HF_TOKEN`**: required if you select the `olmOCR-2-7B-1025-FP8` backend (Inference Endpoint)
Notes:
- Secrets are injected as environment variables at runtime.
- If you don’t set a key, selecting that backend will raise a runtime error.
### 4) Wait for the Space build and start
Once the Space finishes building, open it and upload a PDF/image.
## Hugging Face Inference Endpoint (olmOCR backend)
The `olmOCR` backend does **not** call Google/OpenAI. It calls a **Hugging Face Inference Endpoint** using `huggingface_hub.InferenceClient`.
Important details:
- The endpoint URL is currently hardcoded as `HF_ENDPOINT_URL` in `olm_ocr.py`.
- To run this in your own HF account, you typically need your **own** endpoint URL.
### Create your own endpoint
1. Go to `https://huggingface.co/inference-endpoints`
2. Create an endpoint for the model:
- `allenai/olmOCR-2-7B-1025-FP8`
3. Wait until the endpoint status is **Running**
4. Copy the endpoint URL (it looks like `https://<id>.<region>.<provider>.endpoints.huggingface.cloud`)
### Configure this repo to use your endpoint
Update `HF_ENDPOINT_URL` in `olm_ocr.py` to your endpoint URL.
The call is authenticated via `HF_TOKEN`:
- Create a token at `https://huggingface.co/settings/tokens`
- Make sure it can access your endpoint (and the endpoint is in the same account/org)
## About `GEMINI_API_KEY` and Hugging Face
`GEMINI_API_KEY` is used by the **Gemini backend** (`gemini_backend.py`) via the `google-genai` client.
If you are running this app on Hugging Face Spaces:
- You still provide `GEMINI_API_KEY` as a **Space secret**.
- You do **not** get a Gemini key from Hugging Face.
If you specifically want Gemini calls to go through a Hugging Face-hosted endpoint, that would require a different integration than the current code (e.g. calling an HF Inference Endpoint hosting a Gemini-compatible service). The current implementation calls Google directly using your `GEMINI_API_KEY`.
## Troubleshooting
- **`OPENAI_API_KEY environment variable is not set.`**
- Add `OPENAI_API_KEY` as env var (local `.env`) or Space secret.
- **`GEMINI_API_KEY environment variable is not set.`**
- Add `GEMINI_API_KEY` as env var (local `.env`) or Space secret.
- **olmOCR endpoint errors (401/403)**
- Ensure `HF_TOKEN` is set and has permission to call the endpoint.
- Ensure `HF_ENDPOINT_URL` points to an endpoint you own / can access.
- **Space builds but doesn’t start / crashes**
- Check Space logs.
- Make sure you’re not missing required secrets for the backend you select.