Spaces:

ibm-esa-geospatial
/

challenge

Running

App Files Files Community

[Question] Captions as modality

#14

by erikagu - opened Jan 18

Discussion

erikagu

Jan 18

•

edited Jan 18

Hi TerraMind team, I would like to experiment with the model's caption modality for generation. However, in the documentation, I do not see "Captions" listed as an available input modality, except for the v01 models - which is stated to not be publicly available yet. Can you confirm whether the Captions modality can be used by the TerraMind 1.0 Generative Models? If so, how can I properly refer to the modality and preprocess the data? If not, any potential workarounds?
Thank you!
@jhnnsjkbk @blumenstiel

erikagu changed discussion title from [Question] Captions as input modality to [Question] Captions as modality Jan 18

blumenstiel

IBM ESA Geospatial org Jan 19

Hi @erikagu , you are correct, the released 1.0 models do not include captions as a modality. We experimented with it internally and decided to not release the models with captions because the quality is not comparable to what one would expect from todays MLLM (which TerraMind is not, but it would probably be compared against them). We have some plans for an improved version, but don't expect an release in the coming weeks. Feel free to reach out via email if you like some more details.

engoiya

27 days ago

I'm currently converting the model into ONNX and inferencing with an existing multi-agent. Where the endpoints would be well defined for tool use. It would be great to caption the image output after Terramind analysis to improve a large model understanding and EO interoperability.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment