Commits · seanpedrickcase/document

Merge pull request #94 from seanpedrick-case/dev

5e5c423

Running
unverified

Sean Pedrick-Case commited on about 20 hours ago

Added regex functionality to deny lists. Corrected tesseract to word level parsing. Improved review search regex capabilities. Updated documentation

4852fb5

seanpedrickcase commited on about 21 hours ago

Allow for tesseract to run OCR in line-level mode and then query LLM with line-level data. Added option for running as MCP server, added api for multi-word text search

419fb7d

seanpedrickcase commited on about 22 hours ago

Merge pull request #93 from seanpedrick-case/dev

7832a41
unverified

Sean Pedrick-Case commited on 1 day ago

Merge pull request #92 from seanpedrick-case/regex_search

c2d2ccd
unverified

Sean Pedrick-Case commited on 1 day ago

Added regex search feature for multi-word text search

21318d3

seanpedrickcase commited on 1 day ago

Minor update to cli_redact for new local OCR model options. Updated app_settings.qmd, user_guide.qmd, and readme.md with descriptions of new features

d5b5291

seanpedrickcase commited on 1 day ago

Fixed minor bugs related to Textract API calls, pyproject format. Removed print statements and fixed some future concat deprecation issues

7bb945f

seanpedrickcase commited on 1 day ago

Merge pull request #90 from seanpedrick-case/dev

50caf2f
unverified

Sean Pedrick-Case commited on 1 day ago

formatter and linter applied

ca530a1

seanpedrickcase commited on 2 days ago

Merge pull request #89 from seanpedrick-case/textract_type_name_output

00011db
unverified

Sean Pedrick-Case commited on 2 days ago

Added suffix to textract output files according to tasks included (e.g. signature analysis). Improved reporting when Textract client doesn't exist. Fixed display for cost and time taken. Changes to config variables to allow exclusion of PaddleOCR from display

25e2089

seanpedrickcase commited on 2 days ago

Improved paddle and hybrid OCR analysis across all options. Tried to revise requirements for spaces

2c00d05

seanpedrickcase commited on 2 days ago

Added paddle to pre-requirements.txt

01c8eb6

seanpedrickcase commited on 3 days ago

Allowed for load Paddle at startup. Updated requirements for torch compatability

bf83b6f

seanpedrickcase commited on 3 days ago

Updated requirements

1935c45

seanpedrickcase commited on 3 days ago

Updated requirements for torch. Updated main hf flow to force changes to spaces repo

e59fbb7

seanpedrickcase commited on 4 days ago

Updated dependencies, github to HF workflow

059a5f7

seanpedrickcase commited on 4 days ago

Updated sync to hf workflow for zero GPU space sync

27ed5c8

seanpedrickcase commited on 4 days ago

Updated readme for install instructions with paddle, vlms

c3ccad4

seanpedrickcase commited on 4 days ago

Merge pull request #88 from seanpedrick-case/vlm_support

cd01917
unverified

Sean Pedrick-Case commited on 4 days ago

formatter and linter applied

bcb5ad4

seanpedrickcase commited on 4 days ago

Updated word segmenter code

4440bed

seanpedrickcase commited on 4 days ago

Similar cleanup to requirements_lightweight.txt

ef8c72e

seanpedrickcase commited on 4 days ago

Cleaned requirements.txt file

40bd54b

seanpedrickcase commited on 4 days ago

Updated test suites to use the lightweight version of requirements.txt

f5146c7

seanpedrickcase commited on 4 days ago

Added text rotation capability

1ff0b3d

seanpedrickcase commited on 4 days ago

Optimised VLM model selection

54a5789

seanpedrickcase commited on 4 days ago

Optimised VLM model choice and prompting/parameters

ad60619

seanpedrickcase commited on 4 days ago

Added hybrid paddle + vlm option. Optimised word segmenters for single words. Optimised package installation in pyproject.toml

6d4f6e4

seanpedrickcase commited on 4 days ago

Added upgraded line to word parsing algorithm. Added dependencies and framework for Huggingface spaces deployment with ZeroGPU

c2becd8

seanpedrickcase commited on 4 days ago

Improved new requirements. Improved visual OCR outputs and word-level Paddle outputs and general bounding box positioning

e4493fe

seanpedrickcase commited on 10 days ago