Spaces:
Running
Running
File size: 6,043 Bytes
4f6b8d0 9e95e32 53c2d1b c06eccb 9e95e32 53c2d1b 9e95e32 53c2d1b 9e95e32 53c2d1b 9e95e32 53c2d1b 9e95e32 53c2d1b 9e95e32 53c2d1b 9e95e32 53c2d1b 9e95e32 2137c11 9e95e32 2137c11 53c2d1b 27a79f7 53c2d1b c810e2f fe9e665 8c495c2 fe9e665 57d01e6 53c2d1b 57d01e6 c810e2f 53c2d1b 2137c11 c810e2f 2137c11 f1dfad5 9e95e32 f1dfad5 2137c11 f1dfad5 2137c11 f1dfad5 9e95e32 f1dfad5 2137c11 9e95e32 f1dfad5 2137c11 f1dfad5 2137c11 9e95e32 2137c11 9e95e32 2137c11 9e95e32 2137c11 9e95e32 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 | ---
title: MOSAICapp
colorFrom: indigo
colorTo: blue
sdk: docker
pinned: false
---
# MOSAICapp
[](https://doi.org/10.5281/zenodo.18394317)
A web application for topic modelling of phenomenological reports using BERTopic and transformer embeddings.
**Web app:** [huggingface.co/spaces/romybeaute/MOSAICapp](https://huggingface.co/spaces/romybeaute/MOSAICapp)
## Statement of Need
Consciousness research increasingly relies on open-ended subjective reports to capture the richness of lived experience. Structured questionnaires like the Altered States of Consciousness scales or the MEQ impose predefined categories that can miss unexpected experiential dimensions.
MOSAICapp provides an alternative: instead of forcing reports into predefined categories, it uses neural topic modelling to discover thematic structure directly from the text. This "wide-angle" approach lets researchers see what participants actually describe before committing to a categorical framework.
The tool is designed for consciousness researchers, phenomenologists, and qualitative researchers working with text data who want computational analysis without writing code.
## Features
- **No-code interface** — upload CSV, configure parameters, download results
- **Sentence-level analysis** — optional segmentation for finer-grained themes
- **Interactive visualisations** — 2D topic maps, hierarchical clustering, topic distributions
- **LLM topic labelling** — automatic generation of interpretable labels (full version)
- **Python API** — `mosaic_core` library for programmatic use and batch processing
---
## 1. Quick Start (No Installation)
The easiest way to use MOSAICapp is via the hosted web interface. No coding or installation is required.
**[Launch MOSAICapp on Hugging Face](https://huggingface.co/spaces/romybeaute/MOSAICapp)**
*Note: The hosted version runs on shared resources. For large datasets or privacy-sensitive data, we recommend the local installation below.*
---
## 2. Local Installation
Run the app on your own machine to use custom GPUs, process sensitive data locally, or modify the code.
### Prerequisites
- Python 3.9+
- Git
### Setup steps
```bash
git clone https://github.com/romybeaute/MOSAICapp.git
cd MOSAICapp
# Create virtual environment (recommended)
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies and the package
pip install -r requirements.txt
pip install .
# Download NLTK data (required for segmentation)
python -c "import nltk; nltk.download('punkt')"
```
---
## 3. Configuration & Running
### Run the app
```
streamlit run app.py
```
### LLM Setup (Optional)
To use the Automated Topic Labelling feature (Llama-3), you must provide a Hugging Face Access Token. The app uses this token to access the inference API.
1. Get a Token: Log in to Hugging Face and create a token with "Read" permissions.
2. Configure Local App:
- Create a folder named .streamlit in your root directory.
- Inside it, create a file named secrets.toml.
- Add your token in TOML file:
```
HF_TOKEN = "hf_..."
```
- Note: This file is ignored by Git to protect your credentials.
---
## 4. Running Tests
We include a test suite to verify the installation and core logic. This is useful to check if your environment is set up correctly.
**Run everything:**
```bash
pytest tests/ -v
```
**Run only fast tests:**
```bash
pytest tests/test_core_functions.py -v
```
This will automatically load a dummy dataset included in the repo and verify:
- Data loading (CSV parsing)
- Embedding generation
- Topic modelling pipeline
- Visualisation outputs
---
## 5. Python API (Advanced Usage)
MOSAICapp is also a Python library. You can import `mosaic_core` in your own scripts or Jupyter Notebooks for batch processing or custom analysis pipelines.
### Library usage
```python
from mosaic_core.core_functions import preprocess_and_embed, run_topic_model
# 1. Load and Preprocess
docs, embeddings = preprocess_and_embed("data.csv", text_col="report")
# 2. Configure Parameters
config = {
"umap_params": {"n_neighbors": 15, "n_components": 5},
"hdbscan_params": {"min_cluster_size": 10},
"bt_params": {"nr_topics": "auto"}
}
# 3. Run Model
model, reduced_embeddings, topics = run_topic_model(docs, embeddings, config)
```
## Input format
CSV file with a text column. The app auto-detects columns named `text`, `report`, `reflection_answer`, or `reflection_answer_english`. Any column can also be selected manually.
---
## How it works
MOSAICapp implements a BERTopic pipeline: texts are embedded using sentence transformers, reduced with UMAP, clustered with HDBSCAN, and labelled using c-TF-IDF (with optional LLM refinement). This approach captures semantic context better than older bag-of-words methods like LDA.
For methodological details, see the [MOSAIC paper](https://arxiv.org/abs/2502.18318).
---
## Research applications
MOSAICapp has been used to analyse:
- Stroboscopic light experiences from the Dreamachine project
- Descriptions of "pure awareness" from the Minimal Phenomenal Experience study
- Psychedelic experience reports (DMT, 5-MeO-DMT micro-phenomenological interviews)
## Citation
```bibtex
@article{beaute2025mosaic,
title={Mapping of Subjective Accounts into Interpreted Clusters (MOSAIC):
Topic Modelling and LLM Applied to Stroboscopic Phenomenology},
author={Beauté, Romy and Schwartzman, David J and Dumas, Guillaume and
Crook, Jennifer and Macpherson, Fiona and Barrett, Adam B and Seth, Anil K},
journal={arXiv preprint arXiv:2502.18318},
year={2025}
}
```
## Contributing
See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines on reporting bugs, suggesting features, and contributing code.
## License
MIT
## Acknowledgements
Built with [BERTopic](https://github.com/MaartenGr/BERTopic) by Maarten Grootendorst. Funded by the Be.AI Leverhulme doctoral scholarship at the Sussex Centre for Consciousness Science. |