| | --- |
| | license: mit |
| | tags: |
| | - cancer-genomics |
| | - bioinformatics |
| | - graph-database |
| | - neo4j |
| | - distributed-computing |
| | - boinc |
| | - healthcare |
| | - genomics |
| | - fastq |
| | - blast |
| | - variant-calling |
| | - gdc-portal |
| | - tcga |
| | library_name: cancer-at-home-v2 |
| | pipeline_tag: other |
| | metrics: |
| | - accuracy |
| | - bleu |
| | - bleurt |
| | --- |
| | |
| | # Cancer@Home v2 |
| |
|
| | A distributed computing platform for cancer genomics research, combining BOINC distributed computing, GDC cancer data analysis, sequence processing (FASTQ/BLAST), and Neo4j graph visualization. |
| |
|
| | ## π Quick Start (5 minutes) |
| |
|
| | ### Prerequisites |
| | - Python 3.8+ |
| | - Docker Desktop |
| | - 8GB RAM minimum |
| |
|
| | ### Installation |
| |
|
| | 1. **Clone and setup** |
| | ```bash |
| | cd CancerAtHome2 |
| | python -m venv venv |
| | venv\Scripts\activate # Windows |
| | pip install -r requirements.txt |
| | ``` |
| |
|
| | 2. **Start Neo4j Database** |
| | ```bash |
| | docker-compose up -d |
| | ``` |
| |
|
| | 3. **Run the application** |
| | ```bash |
| | python run.py |
| | ``` |
| |
|
| | 4. **Open your browser** |
| | - Application: http://localhost:5000 |
| | - Neo4j Browser: http://localhost:7474 (username: neo4j, password: cancer123) |
| |
|
| | ## π― Features |
| |
|
| | ### 1. **Distributed Computing (BOINC Integration)** |
| | - Submit cancer research computational tasks |
| | - Monitor distributed workload processing |
| | - Real-time task status tracking |
| |
|
| | ### 2. **GDC Data Integration** |
| | - Download cancer genomics data from GDC Portal |
| | - Support for various cancer types (TCGA, TARGET projects) |
| | - Automatic data parsing and normalization |
| |
|
| | ### 3. **Sequence Analysis Pipeline** |
| | - FASTQ file processing |
| | - BLAST sequence alignment |
| | - Variant calling and annotation |
| |
|
| | ### 4. **Neo4j Graph Database** |
| | - Graph-based cancer data modeling |
| | - Relationships: Gene β Mutation β Patient β Cancer Type |
| | - Interactive graph visualization |
| |
|
| | ### 5. **GraphQL API** |
| | - Query cancer data flexibly |
| | - Filter by gene, mutation, patient cohort |
| | - Aggregate statistics |
| |
|
| | ### 6. **Interactive Dashboard** |
| | - Real-time data visualization |
| | - Network graphs for gene interactions |
| | - Mutation frequency charts |
| | - Patient cohort analysis |
| |
|
| | ## π Architecture |
| |
|
| | ``` |
| | Cancer@Home v2 |
| | β |
| | βββ Frontend (React + D3.js) |
| | β βββ Dashboard |
| | β βββ Neo4j Visualization |
| | β βββ Task Monitor |
| | β |
| | βββ Backend (FastAPI) |
| | β βββ REST API |
| | β βββ GraphQL Endpoint |
| | β βββ WebSocket (real-time updates) |
| | β |
| | βββ Data Layer |
| | β βββ Neo4j (Graph Database) |
| | β βββ BOINC Client |
| | β βββ GDC API Client |
| | β |
| | βββ Analysis Pipeline |
| | βββ FASTQ Parser |
| | βββ BLAST Wrapper |
| | βββ Variant Annotator |
| | ``` |
| |
|
| | ## ποΈ Project Structure |
| |
|
| | ``` |
| | CancerAtHome2/ |
| | βββ backend/ |
| | β βββ api/ # FastAPI routes |
| | β βββ boinc/ # BOINC integration |
| | β βββ gdc/ # GDC data fetcher |
| | β βββ neo4j/ # Neo4j database layer |
| | β βββ pipeline/ # Bioinformatics pipeline |
| | β βββ graphql/ # GraphQL schema |
| | βββ frontend/ |
| | β βββ public/ |
| | β βββ src/ |
| | β βββ components/ # React components |
| | β βββ views/ # Page views |
| | β βββ api/ # API client |
| | βββ data/ # Downloaded datasets |
| | βββ docker-compose.yml # Neo4j container |
| | βββ requirements.txt # Python dependencies |
| | βββ run.py # Main entry point |
| | ``` |
| |
|
| | ## 𧬠Data Flow |
| |
|
| | 1. **Data Ingestion**: Download cancer genomics data from GDC Portal |
| | 2. **Processing**: Run FASTQ/BLAST analysis on distributed BOINC network |
| | 3. **Storage**: Store results in Neo4j graph database |
| | 4. **Visualization**: Query and visualize via web dashboard |
| |
|
| | ## π§ Configuration |
| |
|
| | Edit `config.yml` to customize: |
| | - Neo4j connection settings |
| | - GDC API parameters |
| | - BOINC project URL |
| | - Analysis pipeline options |
| |
|
| | ## π Usage Examples |
| |
|
| | ### Query Mutations by Gene |
| | ```graphql |
| | query { |
| | mutations(gene: "TP53") { |
| | id |
| | position |
| | consequence |
| | patients { |
| | cancerType |
| | stage |
| | } |
| | } |
| | } |
| | ``` |
| |
|
| | ### Submit Analysis Task |
| | ```python |
| | from backend.boinc import BOINCClient |
| | |
| | client = BOINCClient() |
| | task_id = client.submit_task( |
| | workunit_type="variant_calling", |
| | input_file="sample.fastq" |
| | ) |
| | ``` |
| |
|
| | ## π€ Inspired By |
| |
|
| | - [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) - Distributed cancer research |
| | - [Neo4j Cancer Visualization](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4) - Graph-based cancer data modeling |
| |
|
| | ## π License |
| |
|
| | MIT License |
| |
|
| | ## π Support |
| |
|
| | For issues or questions, please open a Huggingface or GitHub issue. |