Spaces:

mudejay07
/

cra

Sleeping

App Files Files Community

cra / deployment_bundle /docs /setup_guide.md

mudejayaprakash

Deploy to HF (clean history)

7d9ecef 2 months ago

preview code

raw

history blame contribute delete

6.93 kB

A newer version of the Streamlit SDK is available: 1.54.0

Upgrade

Setup Guide - Clinical Review Assistant

Prerequisites

Before you begin, ensure you have:

Python 3.9+ installed
Git installed
OpenAI API key (Get one here)
Pinecone account and API key (Sign up here)

Step 1: Clone the Repository

git clone https://github.com/mudejayaprakash/Clinical_Review_Assistant

cd clinical-review-assistant

Step 2: Create Virtual Environment (Recommended)

macOS/Linux:

python3 -m venv venv
source venv/bin/activate

Windows:

python -m venv venv
venv\Scripts\activate

Step 3: Install Dependencies

pip install -r requirements.txt

Expected installation time: 2-3 minutes

Step 4: Configure Environment Variables

Copy the example file:

cp .env.example .env

Edit .env file and add your API keys:

# Required API Keys
OPENAI_API_KEY=sk-your-openai-api-key-here
PINECONE_API_KEY=your-pinecone-api-key-here
PINECONE_INDEX_NAME=medical-policies
PINECONE_NAMESPACE=policies

# Application Settings (Optional - defaults provided)
MODEL_SUMMARY=gpt-4o
MODEL_EVALUATION=gpt-4o
EMBEDDING_MODEL=cambridgeltl/SapBERT-from-PubMedBERT-fulltext

Important: Never commit the .env file to Git (already in .gitignore)

Step 5: Set Up Pinecone Index

Option A: Create Index via Pinecone Dashboard

Go to Pinecone Console
Click "Create Index"
Configure:
- Name: medical-policies
- Dimensions: 768 (for SapBERT embeddings)
- Metric: cosine
- Region: Choose closest to you
Click "Create Index"

Option B: Create Index via Python

python3 << 'EOF'
from pinecone import Pinecone
import os
from dotenv import load_dotenv

load_dotenv()
pc = Pinecone(api_key=os.getenv('PINECONE_API_KEY'))

# Create index
pc.create_index(
    name='medical-policies',
    dimension=768,
    metric='cosine',
    spec={'serverless': {'cloud': 'aws', 'region': 'us-east-1'}}
)
print("✅ Pinecone index created successfully!")
EOF

Step 6: Load Policy Documents (Optional)

If you have insurance policy PDFs to load:

Place policy PDFs in the data/raw_policy_pdf/ folder:

mkdir -p data/raw_policy_pdf
# Copy your policy PDFs into this folder

Run the data ingestion script:

python tools/data_ingestion.py

This will:

Extract text from PDFs
Create chunks with section-aware splitting
Generate SapBERT embeddings
Upload to Pinecone index

Expected time: 2-5 minutes for 10 policies

Note: You can skip this step and test with an empty policy database, but Node 2 won't retrieve any policies.

Step 7: Run the Application

streamlit run app.py

The application will open in your browser at: http://localhost:8501

Step 8: Create Your First Account

On the login page, click "Register" tab
Enter a username and password
Click "Create Account"
Login with your new credentials

Testing the Application

Quick Test Workflow:

Upload a test medical record (PDF format)
Click "Summarize and Analyze Records"
Review the generated summary and chief complaints
View retrieved policies (if you loaded policy documents)
Enter test criteria:

   • Patient must be 18 years or older
   • Conservative medical management has failed
   • CT scan or endoscopy confirms septal deviation

Click "Evaluate Criteria"
Review results with evidence, page numbers and confidence scores

Troubleshooting

Issue: "ModuleNotFoundError"

Solution: Ensure you're in the virtual environment and run:

pip install -r requirements.txt

Issue: "OpenAI API key not found"

Solution: Check that your .env file exists and contains valid API keys:

cat .env | grep OPENAI_API_KEY

Issue: "Pinecone index not found"

Solution: Verify index name matches in .env and Pinecone dashboard:

python3 -c "from pinecone import Pinecone; import os; from dotenv import load_dotenv; load_dotenv(); pc = Pinecone(api_key=os.getenv('PINECONE_API_KEY')); print(pc.list_indexes())"

Issue: "PDF processing fails"

Solution: Ensure PDF is:

Under 50MB
Not password-protected
Contains extractable text (not just scanned images)

Issue: "Port 8501 already in use"

Solution: Stop other Streamlit instances or use a different port:

streamlit run app.py --server.port 8502

Project Structure

clinical-review-assistant/
├── app.py                          # Main Streamlit application
├── agents/
│   ├── __init__.py
│   ├── config.py                   # Configuration settings
│   ├── agent.py                    # LangGraph agent orchestrator
│   ├── nodes.py                    # Node 1, 2, 3 implementations
│   ├── security.py                 # Security & audit logging
│   └── auth.py                     # Authentication system
├── tools/
│   ├── __init__.py
│   ├── rag.py                      # RAG utilities
│   ├── rag_pinecone.py             # Pinecone integration
│   └── data_ingestion.py           # Policy ingestion pipeline
├── data/
│   ├── raw_policy_pdf/             # Policy PDFs (you create)
│   └── policy_txt/                 # Extracted text (auto-generated)
├── docs/
│   ├── agent_workflow.png
│   ├── architecture_diagram.png    
│   ├── screnshots/                 # To display in README
│   └── setup_guide.md              # This file                
├── requirements.txt                # Python dependencies
├── .env.example                    # Environment template
├── .env                            # Your API keys (create from .env.example)
├── .gitignore                      # Git ignore file
└── README.md                       # Project documentation

Next Steps

Customize policies: Add new insurance policies to data/raw_policy_pdf/
Test with real data: Upload actual medical records (ensure PHI compliance)
Adjust configuration: Modify agents/config.py for custom settings
Review logs: Check security.log for audit trail
Scale deployment: Deploy to Streamlit Cloud or AWS for production use

Support

For issues or questions:

Check Troubleshooting section above
Review README for detailed documentation

Development Mode

To run in development mode with auto-reload:

streamlit run app.py --server.runOnSave true

To view detailed logs:

tail -f security.log

Setup complete! You're ready to start using the Clinical Review Assistant. 🎉