document_redaction / pyproject.toml

Commit History

Allow for tesseract to run OCR in line-level mode and then query LLM with line-level data. Added option for running as MCP server, added api for multi-word text search
419fb7d

seanpedrickcase commited on

Minor update to cli_redact for new local OCR model options. Updated app_settings.qmd, user_guide.qmd, and readme.md with descriptions of new features
d5b5291

seanpedrickcase commited on

Fixed minor bugs related to Textract API calls, pyproject format. Removed print statements and fixed some future concat deprecation issues
7bb945f

seanpedrickcase commited on

Improved paddle and hybrid OCR analysis across all options. Tried to revise requirements for spaces
2c00d05

seanpedrickcase commited on

Updated dependencies, github to HF workflow
059a5f7

seanpedrickcase commited on

Added hybrid paddle + vlm option. Optimised word segmenters for single words. Optimised package installation in pyproject.toml
6d4f6e4

seanpedrickcase commited on

Added upgraded line to word parsing algorithm. Added dependencies and framework for Huggingface spaces deployment with ZeroGPU
c2becd8

seanpedrickcase commited on

Updated user guide and app settings. Updated some additional lambda_entrypoint arguments. Ensured that examples are correctly displayed on GUI.
c543ba0

seanpedrickcase commited on

Updated paddleocr implementation to have a menu option on the GUI with a config value change. Minor package updates, favicon update, and update to Dockerfile to allow for Lambda function execution.
0d7ad2a

seanpedrickcase commited on

Fixed text inclusion in review pdf outputs from apply_redactions_to_review_df... function. Apply redaction pymupdf text/graphic/image options are now modifiable. Tables on review screen should now be able to use Gradio column filter options. Moved some functions to more logical location.
b597212

seanpedrickcase commited on

Moved from gunicorn to uvicorn for AWS deployment
799caf1

seanpedrickcase commited on

Added gunicorn to requirements for when building Dockerfile based on FastAPI rather than Gradio directly. Updated minor some file path issues. Set return review PDF as default.
b38d4b9

seanpedrickcase commited on

Added capability of loading in redaction annotations from PDF documents directly into the app. Minor function documentation improvements, GUI changes, package updates.
b61459d

seanpedrickcase commited on

Removed some extraneous test steps. Improved Example loading and feedback, and redaction feedback. Minor security updates. Fixed Adobe xfdf file parsing.
1cb1897

seanpedrickcase commited on

Updated Windows Tesseract install location for test
96b0e0e

seanpedrickcase commited on

Fixed duplicate page argument mismatch. Readded Windows tests. Added refresh token options to cdk. Package updates
ad8fef5

seanpedrickcase commited on

Fixed on deprecated Github workflow functions. Applied linter and formatter to code throughout. Added tests for GUI load.
bafcf39

seanpedrickcase commited on

Added a test suite based on the functions in cli_redact.py
084af54

seanpedrickcase commited on

Added example data files. Greatly revised CLI redaction for redaction, deduplication, and AWS Textract batch calls. Various minor fixes and package updates.
d60759d

seanpedrickcase commited on

Fix to tabular redaction, added tabular deduplication. Updated cli call capability for both
aa5c211

seanpedrickcase commited on

Updated review functions to update with manual reviews. Minor package update
80268bb

seanpedrickcase commited on

Corrected some multiple xlsx/docx file redaction issues. package updates.
6f96988

seanpedrickcase commited on

Added capability to redact Word files
57aca87

seanpedrickcase commited on

Can now redact terms using a new redact search tab on the Review Redactions tab. Various minor improvements
ee6b7fb

seanpedrickcase commited on

Updated packages. Corrected CSV logger headings, can now submit custom log csv names to S3. Started work on identifying and deduplicating at the line level
e424038

seanpedrickcase commited on

Updated CDK code for custom KMS keys, new VPCs. Minor package updates.
9f51e70

seanpedrickcase commited on

Updated duplicate pages interface to include subdocuments and review. Updated relevant user guide. Minor package updates
f47b137

seanpedrickcase commited on

Adapted Dockerfile for systems with read only file system. Minor package updates.
a7566b9

seanpedrickcase commited on

Updated version numbers, gradio package version.
20b655f

seanpedrickcase commited on

Updated gradio version. Minor changes to redactor function sequence. Minor formatting and wording changes.
5a21738

seanpedrickcase commited on

Corrected a couple of bugs. Now Textract whole document API call outputs will load also the input PDF into the app
10f46e9

seanpedrickcase commited on

Updated version numbers, minor text revision
69c2af9

seanpedrickcase commited on

Minor changes for cost codes, package updates. Added pyproject.toml file
47a3a80

seanpedrickcase commited on