A newer version of the Streamlit SDK is available:
1.54.0
Setup Guide - Clinical Review Assistant
Prerequisites
Before you begin, ensure you have:
- Python 3.9+ installed
- Git installed
- OpenAI API key (Get one here)
- Pinecone account and API key (Sign up here)
Step 1: Clone the Repository
git clone https://github.com/mudejayaprakash/Clinical_Review_Assistant
cd clinical-review-assistant
Step 2: Create Virtual Environment (Recommended)
macOS/Linux:
python3 -m venv venv
source venv/bin/activate
Windows:
python -m venv venv
venv\Scripts\activate
Step 3: Install Dependencies
pip install -r requirements.txt
Expected installation time: 2-3 minutes
Step 4: Configure Environment Variables
- Copy the example file:
cp .env.example .env
- Edit
.envfile and add your API keys:
# Required API Keys
OPENAI_API_KEY=sk-your-openai-api-key-here
PINECONE_API_KEY=your-pinecone-api-key-here
PINECONE_INDEX_NAME=medical-policies
PINECONE_NAMESPACE=policies
# Application Settings (Optional - defaults provided)
MODEL_SUMMARY=gpt-4o
MODEL_EVALUATION=gpt-4o
EMBEDDING_MODEL=cambridgeltl/SapBERT-from-PubMedBERT-fulltext
Important: Never commit the .env file to Git (already in .gitignore)
Step 5: Set Up Pinecone Index
Option A: Create Index via Pinecone Dashboard
- Go to Pinecone Console
- Click "Create Index"
- Configure:
- Name:
medical-policies - Dimensions:
768(for SapBERT embeddings) - Metric:
cosine - Region: Choose closest to you
- Name:
- Click "Create Index"
Option B: Create Index via Python
python3 << 'EOF'
from pinecone import Pinecone
import os
from dotenv import load_dotenv
load_dotenv()
pc = Pinecone(api_key=os.getenv('PINECONE_API_KEY'))
# Create index
pc.create_index(
name='medical-policies',
dimension=768,
metric='cosine',
spec={'serverless': {'cloud': 'aws', 'region': 'us-east-1'}}
)
print("β
Pinecone index created successfully!")
EOF
Step 6: Load Policy Documents (Optional)
If you have insurance policy PDFs to load:
- Place policy PDFs in the
data/raw_policy_pdf/folder:
mkdir -p data/raw_policy_pdf
# Copy your policy PDFs into this folder
- Run the data ingestion script:
python tools/data_ingestion.py
This will:
- Extract text from PDFs
- Create chunks with section-aware splitting
- Generate SapBERT embeddings
- Upload to Pinecone index
Expected time: 2-5 minutes for 10 policies
Note: You can skip this step and test with an empty policy database, but Node 2 won't retrieve any policies.
Step 7: Run the Application
streamlit run app.py
The application will open in your browser at: http://localhost:8501
Step 8: Create Your First Account
- On the login page, click "Register" tab
- Enter a username and password
- Click "Create Account"
- Login with your new credentials
Testing the Application
Quick Test Workflow:
- Upload a test medical record (PDF format)
- Click "Summarize and Analyze Records"
- Review the generated summary and chief complaints
- View retrieved policies (if you loaded policy documents)
- Enter test criteria:
β’ Patient must be 18 years or older
β’ Conservative medical management has failed
β’ CT scan or endoscopy confirms septal deviation
- Click "Evaluate Criteria"
- Review results with evidence, page numbers and confidence scores
Troubleshooting
Issue: "ModuleNotFoundError"
Solution: Ensure you're in the virtual environment and run:
pip install -r requirements.txt
Issue: "OpenAI API key not found"
Solution: Check that your .env file exists and contains valid API keys:
cat .env | grep OPENAI_API_KEY
Issue: "Pinecone index not found"
Solution: Verify index name matches in .env and Pinecone dashboard:
python3 -c "from pinecone import Pinecone; import os; from dotenv import load_dotenv; load_dotenv(); pc = Pinecone(api_key=os.getenv('PINECONE_API_KEY')); print(pc.list_indexes())"
Issue: "PDF processing fails"
Solution: Ensure PDF is:
- Under 50MB
- Not password-protected
- Contains extractable text (not just scanned images)
Issue: "Port 8501 already in use"
Solution: Stop other Streamlit instances or use a different port:
streamlit run app.py --server.port 8502
Project Structure
clinical-review-assistant/
βββ app.py # Main Streamlit application
βββ agents/
β βββ __init__.py
β βββ config.py # Configuration settings
β βββ agent.py # LangGraph agent orchestrator
β βββ nodes.py # Node 1, 2, 3 implementations
β βββ security.py # Security & audit logging
β βββ auth.py # Authentication system
βββ tools/
β βββ __init__.py
β βββ rag.py # RAG utilities
β βββ rag_pinecone.py # Pinecone integration
β βββ data_ingestion.py # Policy ingestion pipeline
βββ data/
β βββ raw_policy_pdf/ # Policy PDFs (you create)
β βββ policy_txt/ # Extracted text (auto-generated)
βββ docs/
β βββ agent_workflow.png
β βββ architecture_diagram.png
β βββ screnshots/ # To display in README
β βββ setup_guide.md # This file
βββ requirements.txt # Python dependencies
βββ .env.example # Environment template
βββ .env # Your API keys (create from .env.example)
βββ .gitignore # Git ignore file
βββ README.md # Project documentation
Next Steps
- Customize policies: Add new insurance policies to
data/raw_policy_pdf/ - Test with real data: Upload actual medical records (ensure PHI compliance)
- Adjust configuration: Modify
agents/config.pyfor custom settings - Review logs: Check
security.logfor audit trail - Scale deployment: Deploy to Streamlit Cloud or AWS for production use
Support
For issues or questions:
- Check Troubleshooting section above
- Review README for detailed documentation
Development Mode
To run in development mode with auto-reload:
streamlit run app.py --server.runOnSave true
To view detailed logs:
tail -f security.log
Setup complete! You're ready to start using the Clinical Review Assistant. π