cra / deployment_bundle /docs /setup_guide.md
mudejayaprakash
Deploy to HF (clean history)
7d9ecef

A newer version of the Streamlit SDK is available: 1.54.0

Upgrade

Setup Guide - Clinical Review Assistant

Prerequisites

Before you begin, ensure you have:


Step 1: Clone the Repository

git clone https://github.com/mudejayaprakash/Clinical_Review_Assistant

cd clinical-review-assistant

Step 2: Create Virtual Environment (Recommended)

macOS/Linux:

python3 -m venv venv
source venv/bin/activate

Windows:

python -m venv venv
venv\Scripts\activate

Step 3: Install Dependencies

pip install -r requirements.txt

Expected installation time: 2-3 minutes


Step 4: Configure Environment Variables

  1. Copy the example file:
cp .env.example .env
  1. Edit .env file and add your API keys:
# Required API Keys
OPENAI_API_KEY=sk-your-openai-api-key-here
PINECONE_API_KEY=your-pinecone-api-key-here
PINECONE_INDEX_NAME=medical-policies
PINECONE_NAMESPACE=policies

# Application Settings (Optional - defaults provided)
MODEL_SUMMARY=gpt-4o
MODEL_EVALUATION=gpt-4o
EMBEDDING_MODEL=cambridgeltl/SapBERT-from-PubMedBERT-fulltext

Important: Never commit the .env file to Git (already in .gitignore)


Step 5: Set Up Pinecone Index

Option A: Create Index via Pinecone Dashboard

  1. Go to Pinecone Console
  2. Click "Create Index"
  3. Configure:
    • Name: medical-policies
    • Dimensions: 768 (for SapBERT embeddings)
    • Metric: cosine
    • Region: Choose closest to you
  4. Click "Create Index"

Option B: Create Index via Python

python3 << 'EOF'
from pinecone import Pinecone
import os
from dotenv import load_dotenv

load_dotenv()
pc = Pinecone(api_key=os.getenv('PINECONE_API_KEY'))

# Create index
pc.create_index(
    name='medical-policies',
    dimension=768,
    metric='cosine',
    spec={'serverless': {'cloud': 'aws', 'region': 'us-east-1'}}
)
print("βœ… Pinecone index created successfully!")
EOF

Step 6: Load Policy Documents (Optional)

If you have insurance policy PDFs to load:

  1. Place policy PDFs in the data/raw_policy_pdf/ folder:
mkdir -p data/raw_policy_pdf
# Copy your policy PDFs into this folder
  1. Run the data ingestion script:
python tools/data_ingestion.py

This will:

  • Extract text from PDFs
  • Create chunks with section-aware splitting
  • Generate SapBERT embeddings
  • Upload to Pinecone index

Expected time: 2-5 minutes for 10 policies

Note: You can skip this step and test with an empty policy database, but Node 2 won't retrieve any policies.


Step 7: Run the Application

streamlit run app.py

The application will open in your browser at: http://localhost:8501


Step 8: Create Your First Account

  1. On the login page, click "Register" tab
  2. Enter a username and password
  3. Click "Create Account"
  4. Login with your new credentials

Testing the Application

Quick Test Workflow:

  1. Upload a test medical record (PDF format)
  2. Click "Summarize and Analyze Records"
  3. Review the generated summary and chief complaints
  4. View retrieved policies (if you loaded policy documents)
  5. Enter test criteria:
   β€’ Patient must be 18 years or older
   β€’ Conservative medical management has failed
   β€’ CT scan or endoscopy confirms septal deviation
  1. Click "Evaluate Criteria"
  2. Review results with evidence, page numbers and confidence scores

Troubleshooting

Issue: "ModuleNotFoundError"

Solution: Ensure you're in the virtual environment and run:

pip install -r requirements.txt

Issue: "OpenAI API key not found"

Solution: Check that your .env file exists and contains valid API keys:

cat .env | grep OPENAI_API_KEY

Issue: "Pinecone index not found"

Solution: Verify index name matches in .env and Pinecone dashboard:

python3 -c "from pinecone import Pinecone; import os; from dotenv import load_dotenv; load_dotenv(); pc = Pinecone(api_key=os.getenv('PINECONE_API_KEY')); print(pc.list_indexes())"

Issue: "PDF processing fails"

Solution: Ensure PDF is:

  • Under 50MB
  • Not password-protected
  • Contains extractable text (not just scanned images)

Issue: "Port 8501 already in use"

Solution: Stop other Streamlit instances or use a different port:

streamlit run app.py --server.port 8502

Project Structure

clinical-review-assistant/
β”œβ”€β”€ app.py                          # Main Streamlit application
β”œβ”€β”€ agents/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ config.py                   # Configuration settings
β”‚   β”œβ”€β”€ agent.py                    # LangGraph agent orchestrator
β”‚   β”œβ”€β”€ nodes.py                    # Node 1, 2, 3 implementations
β”‚   β”œβ”€β”€ security.py                 # Security & audit logging
β”‚   └── auth.py                     # Authentication system
β”œβ”€β”€ tools/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ rag.py                      # RAG utilities
β”‚   β”œβ”€β”€ rag_pinecone.py             # Pinecone integration
β”‚   └── data_ingestion.py           # Policy ingestion pipeline
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw_policy_pdf/             # Policy PDFs (you create)
β”‚   └── policy_txt/                 # Extracted text (auto-generated)
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ agent_workflow.png
β”‚   β”œβ”€β”€ architecture_diagram.png    
β”‚   β”œβ”€β”€ screnshots/                 # To display in README
β”‚   └── setup_guide.md              # This file                
β”œβ”€β”€ requirements.txt                # Python dependencies
β”œβ”€β”€ .env.example                    # Environment template
β”œβ”€β”€ .env                            # Your API keys (create from .env.example)
β”œβ”€β”€ .gitignore                      # Git ignore file
└── README.md                       # Project documentation

Next Steps

  • Customize policies: Add new insurance policies to data/raw_policy_pdf/
  • Test with real data: Upload actual medical records (ensure PHI compliance)
  • Adjust configuration: Modify agents/config.py for custom settings
  • Review logs: Check security.log for audit trail
  • Scale deployment: Deploy to Streamlit Cloud or AWS for production use

Support

For issues or questions:

  • Check Troubleshooting section above
  • Review README for detailed documentation

Development Mode

To run in development mode with auto-reload:

streamlit run app.py --server.runOnSave true

To view detailed logs:

tail -f security.log

Setup complete! You're ready to start using the Clinical Review Assistant. πŸŽ‰