Commit History

Merge pull request #94 from seanpedrick-case/dev
5e5c423
Running
unverified

Sean Pedrick-Case commited on

Added regex functionality to deny lists. Corrected tesseract to word level parsing. Improved review search regex capabilities. Updated documentation
4852fb5

seanpedrickcase commited on

Allow for tesseract to run OCR in line-level mode and then query LLM with line-level data. Added option for running as MCP server, added api for multi-word text search
419fb7d

seanpedrickcase commited on

Merge pull request #93 from seanpedrick-case/dev
7832a41
unverified

Sean Pedrick-Case commited on

Merge pull request #92 from seanpedrick-case/regex_search
c2d2ccd
unverified

Sean Pedrick-Case commited on

Added regex search feature for multi-word text search
21318d3

seanpedrickcase commited on

Minor update to cli_redact for new local OCR model options. Updated app_settings.qmd, user_guide.qmd, and readme.md with descriptions of new features
d5b5291

seanpedrickcase commited on

Fixed minor bugs related to Textract API calls, pyproject format. Removed print statements and fixed some future concat deprecation issues
7bb945f

seanpedrickcase commited on

Merge pull request #90 from seanpedrick-case/dev
50caf2f
unverified

Sean Pedrick-Case commited on

Merge pull request #89 from seanpedrick-case/textract_type_name_output
00011db
unverified

Sean Pedrick-Case commited on

Added suffix to textract output files according to tasks included (e.g. signature analysis). Improved reporting when Textract client doesn't exist. Fixed display for cost and time taken. Changes to config variables to allow exclusion of PaddleOCR from display
25e2089

seanpedrickcase commited on

Improved paddle and hybrid OCR analysis across all options. Tried to revise requirements for spaces
2c00d05

seanpedrickcase commited on

Added paddle to pre-requirements.txt
01c8eb6

seanpedrickcase commited on

Allowed for load Paddle at startup. Updated requirements for torch compatability
bf83b6f

seanpedrickcase commited on

Updated requirements for torch. Updated main hf flow to force changes to spaces repo
e59fbb7

seanpedrickcase commited on

Updated dependencies, github to HF workflow
059a5f7

seanpedrickcase commited on

Updated sync to hf workflow for zero GPU space sync
27ed5c8

seanpedrickcase commited on

Updated readme for install instructions with paddle, vlms
c3ccad4

seanpedrickcase commited on

Merge pull request #88 from seanpedrick-case/vlm_support
cd01917
unverified

Sean Pedrick-Case commited on

Similar cleanup to requirements_lightweight.txt
ef8c72e

seanpedrickcase commited on

Updated test suites to use the lightweight version of requirements.txt
f5146c7

seanpedrickcase commited on

Optimised VLM model choice and prompting/parameters
ad60619

seanpedrickcase commited on

Added hybrid paddle + vlm option. Optimised word segmenters for single words. Optimised package installation in pyproject.toml
6d4f6e4

seanpedrickcase commited on

Added upgraded line to word parsing algorithm. Added dependencies and framework for Huggingface spaces deployment with ZeroGPU
c2becd8

seanpedrickcase commited on

Improved new requirements. Improved visual OCR outputs and word-level Paddle outputs and general bounding box positioning
e4493fe

seanpedrickcase commited on

Initial commit for VLM support. Created visualisations for OCR output. Corrected log_file_output_paths reference.
5e01004

seanpedrickcase commited on

Merge pull request #85 from seanpedrick-case/dev
b04887e
unverified

Sean Pedrick-Case commited on

Again revised spaCy language model load for different languages
2f34683

seanpedrickcase commited on

Merge pull request #84 from seanpedrick-case/dev
9b8dc93
unverified

Sean Pedrick-Case commited on

Modified model load for custom languages with spaCy. Languages should load successfully now.
2148ddd

seanpedrickcase commited on

Merge pull request #83 from seanpedrick-case/dev
08812d7
unverified

Sean Pedrick-Case commited on

User ownership folder change to whole user folder in Dockerfile. Minor changes to documentation
bf7b066

seanpedrickcase commited on

Merge pull request #82 from seanpedrick-case/dev
7625c26
unverified

Sean Pedrick-Case commited on

Ensured that AWS credentials called correctly in logger settings.
43c7a6d

seanpedrickcase commited on

Merge pull request #81 from seanpedrick-case/dev
234aaf5
unverified

Sean Pedrick-Case commited on

Updated user guide and app settings. Updated some additional lambda_entrypoint arguments. Ensured that examples are correctly displayed on GUI.
c543ba0

seanpedrickcase commited on

head attribute added to Gradio blocks context to enable enforcement of direct vs relative file paths. Updates to direct mode/lambda entrypoint to ensure as many options as possible can be user defined
febacad

seanpedrickcase commited on

Merge pull request #80 from seanpedrick-case/main
41e7358
unverified

Sean Pedrick-Case commited on

Fix condition check for SHOW_EXAMPLES
57de024
unverified

Sean Pedrick-Case commited on

Merge pull request #79 from seanpedrick-case/dev
b0dca2c
unverified

Sean Pedrick-Case commited on

Correction to PaddleOCR config variable. Minor print statement changes
6c62394

seanpedrickcase commited on

Revised environment variables for consistency.
5f824f4

seanpedrickcase commited on