[Question] Captions as modality

#14
by erikagu - opened

Hi TerraMind team, I would like to experiment with the model's caption modality for generation. However, in the documentation, I do not see "Captions" listed as an available input modality, except for the v01 models - which is stated to not be publicly available yet. Can you confirm whether the Captions modality can be used by the TerraMind 1.0 Generative Models? If so, how can I properly refer to the modality and preprocess the data? If not, any potential workarounds?
Thank you!
@jhnnsjkbk @blumenstiel

erikagu changed discussion title from [Question] Captions as input modality to [Question] Captions as modality
IBM ESA Geospatial org

Hi @erikagu , you are correct, the released 1.0 models do not include captions as a modality. We experimented with it internally and decided to not release the models with captions because the quality is not comparable to what one would expect from todays MLLM (which TerraMind is not, but it would probably be compared against them). We have some plans for an improved version, but don't expect an release in the coming weeks. Feel free to reach out via email if you like some more details.

I'm currently converting the model into ONNX and inferencing with an existing multi-agent. Where the endpoints would be well defined for tool use. It would be great to caption the image output after Terramind analysis to improve a large model understanding and EO interoperability.

Sign up or log in to comment