document_redaction / Dockerfile

Commit History

Initial commit for VLM support. Created visualisations for OCR output. Corrected log_file_output_paths reference.
5e01004

seanpedrickcase commited on

User ownership folder change to whole user folder in Dockerfile. Minor changes to documentation
bf7b066

seanpedrickcase commited on

Revised Dockerfile structure for lambda vs gradio builds
25bc108

seanpedrickcase commited on

Modified Dockerfile and entrypoint to switch user at runtime. Updated output folder file creation for custom_image_anlyser_engine and find_duplicate_pages.py
3dd6d75

seanpedrickcase commited on

Updated Dockerfile entrypoint references, CDK folder packages update
5ecf07b

seanpedrickcase commited on

Now explicitly setting user ownership for entrypoint.sh in Dockerfile
9ff5e96

seanpedrickcase commited on

Improved PaddleOCR implementation (greater accuracy, now can save outputs with config setting). Updated Dockerfile entrypoint for Lambda to hopefully avoid permissions issues
d882db9

seanpedrickcase commited on

Updated paddleocr implementation to have a menu option on the GUI with a config value change. Minor package updates, favicon update, and update to Dockerfile to allow for Lambda function execution.
0d7ad2a

seanpedrickcase commited on

Returned Dockerfile app folder user ownership to previous version
282e263

seanpedrickcase commited on

Dockerfile changes to ensure entrypoint.sh has correct permissions for lambda run
722bba8

seanpedrickcase commited on

Updated dockerfile with correct libgl1 package reference. Upgraded simple github workflow test to Python 3.12
1ae2284

seanpedrickcase commited on

Updated Dockerfile to Python 3.12 and latest debian release (trixie). Out of bounds review page movement should now not block current page indicator. Simple test on Github removed from main pushes/pull requests.
d881262

seanpedrickcase commited on

Updated Dockerfile to perform relevant package upgrades to latest, and does not now install unnecessary packages.
2270155

seanpedrickcase commited on

Added line to copy CLI binaries (e.g. gunicorn) across between build and run stages in Dockerfile
e421fb3

seanpedrickcase commited on

Allow for RUN_FASTAPI variable to be passed in to Dockerfile as part of the build spec
93fcae3

seanpedrickcase commited on

Changed default of RUN_FASTAPI in Dockerfile to 0
8773642

seanpedrickcase commited on

Added possibility to mount Gradio app in FastAPI and restrict allowed origins (for security). Fixed some mismatched config variable references. Updated Dockerfile and related files to allow for FastAPI/Uvicorn deployment.
09ae4e0

seanpedrickcase commited on

Fix to tabular redaction, added tabular deduplication. Updated cli call capability for both
aa5c211

seanpedrickcase commited on

Updated chown statement for tessdata folder in Dockerfile
f1425ca

seanpedrickcase commited on

Corrected tesseract location entry in config and removed from Dockerfile
4900c53

seanpedrickcase commited on

Updated Dockerfile to make paddlex main folder
27ce2fa

seanpedrickcase commited on

Repaired Dockerfile hopefully finally
881a64f

seanpedrickcase commited on

Corrected second backslash in Dockerfile
6b58709

seanpedrickcase commited on

Updated command line redaction script with more options
3bff849

seanpedrickcase commited on

Added support for other languages. Improved DynamoDB download
9ae09da

seanpedrickcase commited on

Adapted Dockerfile for systems with read only file system. Minor package updates.
a7566b9

seanpedrickcase commited on

Fixed issue in Docker containers built locally without correct folder permissions. Improved config file. Updated Gradio version to fix issue with selecting filtered rows. Minor bug fixes.
a33b955

seanpedrickcase commited on

Modified Dockerfile for correct logging folder ownership
0b9e789

seanpedrickcase commited on

Implemented Textract document API calls and associated output tracking/download. Fixes to config and cost code implementation. General minor bug fixes.
ed5f8c7

seanpedrickcase commited on

Updated Dockerfile to remove references to NLTK, as removed from requirements
208e806

seanpedrickcase commited on

Allowed for output files to be saved into user-specific folders. Added deny list capability to xlsx/csv file redaction
dacc782

seanpedrickcase commited on

Allowed for Textract and Comprehend API calls through AWS keys. File preparation function incorporated into main redaction function to avoid needing user to 'check in' during redaction process
391712c

seanpedrickcase commited on

Added git to the correct area in Dockerfile (build as opposed to run area)
520f2c4

seanpedrickcase commited on

Added git to Dockerfile to be able to install git-based custom gradio components
4790eb4

seanpedrickcase commited on

Added tab to be able to compare pages across multiple documents and redact duplicates
a265560

seanpedrickcase commited on

Enhance file handling and UI features: improved Gradio app layout with fill width option, and integrated new settings for deny, and fully redacted lists (placeholders so far). Updated file conversion functions to handle CSV inputs and added CSV review file generation for redactions. Now retains all original and merged redaction boxes.
a770956

seanpedrickcase commited on

Can now define queue size, max file size, and server port in environment variables
dc17f6e

seanpedrickcase commited on

Updated Dockerfile and entrypoint file to hopefully deal correctly with APP_MODE environment variable
7c7fd7c

seanpedrickcase commited on

Moved chmod command to before user switch in Dockerfile
05c20d6

seanpedrickcase commited on

Ensure entrypoint.sh is copied
3dc1171

seanpedrickcase commited on

Modified Dockerfile hopefully to not need Lambda overrides. Looking into custom headers from Cloudfront to try to get them to work
bf7bb79

seanpedrickcase commited on

Created custom csvlogger to try to overcome AWS Lambda's incompatibility with multithread locks
34bd97b

seanpedrickcase commited on

Changed app_mode arg position in dockerfile, changed default to gradio
d0b63c6

seanpedrickcase commited on

Moved entrypoint.sh creation to before user switch to avoid permission errors
7e8c1c9

seanpedrickcase commited on

Updated Dockerfile and requirements to include relevant Lambda packages
3f9e976

seanpedrickcase commited on

Switched start py file through Dockerfile to lambda_entrypoint. Added gradio links from this .py
6622361

seanpedrickcase commited on