KeyError: 'geochat'
Hi, did you get any solution for this?
I tried updating the transformers version but it didn't help
I still don't have a solution for it.
Hi did you later get a solution for this?
Sadly, no
I just gave up
This is happening because geochat is a custom model type with its own GeoChatConfig and GeoChatLlamaForCausalLM classes, and those Python files aren’t part of the standard Transformers package.
To get around it you can
- Clone the official GeoChat repo locally
- Install it in editable mode so that Python can import geochat like the following: pip install --no-deps -e ~/GeoChat
- Import the classes directly
so like this
from geochat.model import GeoChatConfig, GeoChatLlamaForCausalLM - Load
base_model = GeoChatLlamaForCausalLM.from_pretrained(
"MBZUAI/geochat-7B",
config=config,
trust_remote_code=True,
ignore_mismatched_sizes=True,
)
After that, from_pretrained could find and instantiate GeoChatLlamaForCausalLM just like any other HF model.
Tried your method
@halox7000
. However, after downloading few binaries,, the following error occurs. To reach here, I had to remove config first.
File /usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:3031, in PreTrainedModel._load_pretrained_model.<locals>._find_mismatched_keys(state_dict, model_state_dict, loaded_keys, add_prefix_to_model, remove_prefix_from_model, ignore_mismatched_sizes)
3025 elif add_prefix_to_model:
3026 # The model key doesn't start with `prefix` but `checkpoint_key` does so we remove it.
3027 model_key = ".".join(checkpoint_key.split(".")[1:])
3029 if (
3030 model_key in model_state_dict
-> 3031 and state_dict[checkpoint_key].shape != model_state_dict[model_key].shape
3032 ):
3033 mismatched_keys.append(
3034 (checkpoint_key, state_dict[checkpoint_key].shape, model_state_dict[model_key].shape)
3035 )
3036 del state_dict[checkpoint_key]
KeyError: 'lm_head.weight
i did something like this to fix it
state_dict = {}
if not os.path.isdir(args.model_source):
for shard in sorted(glob.glob(os.path.join(repo_dir, "pytorch_model-*.bin"))):
state_dict.update(torch.load(shard, map_location="cpu"))
if "lm_head.weight" not in state_dict and "embed_tokens.weight" in state_dict:
state_dict["lm_head.weight"] = state_dict["embed_tokens.weight"]
I opted to download models locally from hugging face, then run python geochat_demo.py --model-path ./geochat-7B directly.
I can see the model being loaded. Given the conditionals, I can see the line 101 from the script being executed to load the model.
On a few instances it is able to detect on or two things but mostly the results are poor. Is it because the model is not being loaded properly? First few lines of Logs attached below.
Initializing Chat
------------------------------------------------
geochat-7B
------------------------------------------------
Loading GeoChat......
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:945: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:2025: UserWarning: for vision_model.embeddings.class_embedding: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:2025: UserWarning: for vision_model.embeddings.patch_embedding.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
Some logs from middle.
Some weights of GeoChatLlamaForCausalLM were not initialized from the model checkpoint at ../geochat-7B and are newly initialized: ['model.vision_tower.vision_tower.vision_model.embeddings.position_ids']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of the model checkpoint at openai/clip-vit-large-patch14-336 were not used when initializing CLIPVisionModel:
🤔In the comment above, what have you modified. More specifically - which file have you modfied?

The total count should be 24. It says 11, which is very far from actual count.
Do you mean like a Python file? The model itself is likely to perform poorly on most detection tasks. It was trained primarily through instruction tuning—that is, the researchers provided examples of airplanes along with captions describing their location. Also, VLMs are generally weak at counting, and I suspect the limitation here is for the same reason: they struggle with precise localization and discrete object enumeration.
That said, some vision–language models specialized for geospatial imagery are explicitly geometry-aware. One recent example is RingMoGPT, published in early 2025, which is much newer than GeoChat. I am currently working with GeoChat because its architecture is relatively simple, and I only need it for captioning purposes rather than more complex reasoning.
