Spaces:

snake11235
/

words2csv

Sleeping

App Files Files Community

words2csv / README.md

snake11235

Update README.md

f2d4539 verified 2 months ago

preview code

raw

history blame contribute delete

4.99 kB

	---
	title: Words2csv
	emoji: 👁
	colorFrom: purple
	colorTo: gray
	sdk: docker
	pinned: false
	---

	# words2csv

	Upload a PDF or image and convert the detected content into CSV using one of the supported backends:

	- OpenAI (requires `OPENAI_API_KEY`)
	- Gemini (requires `GEMINI_API_KEY`)
	- olmOCR via Hugging Face Inference Endpoint (requires `HF_TOKEN` and a configured HF Endpoint URL)

	The app runs as a Gradio UI (`app.py`).

	## Research

	The result of the research you can find [here](https://www.linkedin.com/pulse/state-art-handwriting-ocr-what-means-everyday-users-max-nedelchev-f0lxf)

	## Prerequisites

	- Python 3.11+
	- One or more of the following credentials (depending on the model you pick in the UI):
	- `OPENAI_API_KEY`
	- `GEMINI_API_KEY`
	- `HF_TOKEN` (Hugging Face access token with permission to call your Inference Endpoint)

	## Quickstart (local)

	1. Create a virtualenv and install deps

	```bash
	python -m venv .venv
	. .venv/bin/activate
	pip install -r requirements.txt
	```

	2. Create a `.env` file (or export env vars in your shell)

	```bash
	OPENAI_API_KEY=...
	GEMINI_API_KEY=...
	HF_TOKEN=...
	```

	3. Run the app

	```bash
	python app.py
	```

	Then open `http://localhost:7860`.

	## Quickstart (Docker)

	If you prefer Docker locally:

	1. Create `.env` with the keys you need (same as above)
	2. Run:

	```bash
	docker compose up --build
	```

	Open `http://localhost:7860`.

	## Run on your own Hugging Face account (Spaces)

	This repo is configured for Hugging Face Spaces using the Docker SDK (see the YAML frontmatter at the top of this README and the `Dockerfile`).

	### 1) Create the Space

	1. Go to `https://huggingface.co/spaces`
	2. Click Create new Space
	3. Choose:
	- Space SDK: `Docker`
	- Visibility: your choice (private is recommended if you use paid API keys)
	4. Create the Space

	Tip: in the Create new Space flow, you can also use the Clone repository option and paste your repo URL to import this project directly.

	### 2) Push this repository to the Space

	Clone the Space repo and add this project files (or push from your existing git remote). In general, a Space is just a Git repository.

	### 3) Add secrets (required)

	In your Space, go to:

	`Settings` -> `Variables and secrets`

	Add:

	- `OPENAI_API_KEY`: required if you select an OpenAI model in the UI
	- `GEMINI_API_KEY`: required if you select a Gemini model in the UI
	- `HF_TOKEN`: required if you select the `olmOCR-2-7B-1025-FP8` backend (Inference Endpoint)

	Notes:

	- Secrets are injected as environment variables at runtime.
	- If you don’t set a key, selecting that backend will raise a runtime error.

	### 4) Wait for the Space build and start

	Once the Space finishes building, open it and upload a PDF/image.

	## Hugging Face Inference Endpoint (olmOCR backend)

	The `olmOCR` backend does not call Google/OpenAI. It calls a Hugging Face Inference Endpoint using `huggingface_hub.InferenceClient`.

	Important details:

	- The endpoint URL is currently hardcoded as `HF_ENDPOINT_URL` in `olm_ocr.py`.
	- To run this in your own HF account, you typically need your own endpoint URL.

	### Create your own endpoint

	1. Go to `https://huggingface.co/inference-endpoints`
	2. Create an endpoint for the model:
	- `allenai/olmOCR-2-7B-1025-FP8`
	3. Wait until the endpoint status is Running
	4. Copy the endpoint URL (it looks like `https://<id>.<region>.<provider>.endpoints.huggingface.cloud`)

	### Configure this repo to use your endpoint

	Update `HF_ENDPOINT_URL` in `olm_ocr.py` to your endpoint URL.

	The call is authenticated via `HF_TOKEN`:

	- Create a token at `https://huggingface.co/settings/tokens`
	- Make sure it can access your endpoint (and the endpoint is in the same account/org)

	## About `GEMINI_API_KEY` and Hugging Face

	`GEMINI_API_KEY` is used by the Gemini backend (`gemini_backend.py`) via the `google-genai` client.

	If you are running this app on Hugging Face Spaces:

	- You still provide `GEMINI_API_KEY` as a Space secret.
	- You do not get a Gemini key from Hugging Face.

	If you specifically want Gemini calls to go through a Hugging Face-hosted endpoint, that would require a different integration than the current code (e.g. calling an HF Inference Endpoint hosting a Gemini-compatible service). The current implementation calls Google directly using your `GEMINI_API_KEY`.

	## Troubleshooting

	- `OPENAI_API_KEY environment variable is not set.`
	- Add `OPENAI_API_KEY` as env var (local `.env`) or Space secret.
	- `GEMINI_API_KEY environment variable is not set.`
	- Add `GEMINI_API_KEY` as env var (local `.env`) or Space secret.
	- olmOCR endpoint errors (401/403)
	- Ensure `HF_TOKEN` is set and has permission to call the endpoint.
	- Ensure `HF_ENDPOINT_URL` points to an endpoint you own / can access.
	- Space builds but doesn’t start / crashes
	- Check Space logs.
	- Make sure you’re not missing required secrets for the backend you select.