Allow for tesseract to run OCR in line-level mode and then query LLM with line-level data. Added option for running as MCP server, added api for multi-word text search
Updated paddleocr implementation to have a menu option on the GUI with a config value change. Minor package updates, favicon update, and update to Dockerfile to allow for Lambda function execution.
Fixed text inclusion in review pdf outputs from apply_redactions_to_review_df... function. Apply redaction pymupdf text/graphic/image options are now modifiable. Tables on review screen should now be able to use Gradio column filter options. Moved some functions to more logical location.
Added gunicorn to requirements for when building Dockerfile based on FastAPI rather than Gradio directly. Updated minor some file path issues. Set return review PDF as default.
Added capability of loading in redaction annotations from PDF documents directly into the app. Minor function documentation improvements, GUI changes, package updates.
Removed some extraneous test steps. Improved Example loading and feedback, and redaction feedback. Minor security updates. Fixed Adobe xfdf file parsing.
Added example data files. Greatly revised CLI redaction for redaction, deduplication, and AWS Textract batch calls. Various minor fixes and package updates.
Updated packages. Corrected CSV logger headings, can now submit custom log csv names to S3. Started work on identifying and deduplicating at the line level