bug: Unable to chat with image using Moondream2 Vision model #3705

louis-jan · 2024-09-19T12:12:36Z

Jan version

0.5.4

Describe the Bug

I can successfully load the model for chats, but as soon as I send an image, it crashes.
Context:

I created a model.json to download the text and CLIP models.
I was successfully able to load the model and chat with it.
I've attached an image and sent it, but it crashes.

https://huggingface.co/moondream/moondream2-gguf

Same glitch on Linux here.
https://discord.com/channels/1107178041848909847/1285784195125219338/1286348026973261835

Steps to Reproduce

Create a model.json file to interact with the model.
Send an image and request a description.

Screenshots / Logs

2024-09-19T12:11:04.250Z [CORTEX]::Debug: Request to kill cortex
2024-09-19T12:11:04.254Z [CORTEX]::Debug: 20240919 12:10:46.430861 UTC 3549698 DEBUG [LoadModel] Multi Modal Mode Enabled - llama_server_context.cc:159
20240919 12:10:46.676668 UTC 3549698 DEBUG [LoadModel] Request 4096 for context length for llava-1.6 - llama_server_context.cc:170
20240919 12:10:47.890831 UTC 3549698 DEBUG [Initialize] Available slots: - llama_server_context.cc:225
20240919 12:10:47.890848 UTC 3549698 DEBUG [Initialize] -> Slot 0 - max context: 4096 - llama_server_context.cc:233
20240919 12:10:47.890947 UTC 3549698 INFO Started background task here! - llama_server_context.cc:252
20240919 12:10:47.891006 UTC 3549698 INFO Warm-up model: llava-7b - llama_engine.cc:819
20240919 12:10:47.891010 UTC 3549742 DEBUG [UpdateSlots] all slots are idle and system prompt is empty, clear the KV cache - llama_server_context.cc:1250
20240919 12:10:47.891017 UTC 3549742 DEBUG [KvCacheClear] Clear the entire KV cache - llama_server_context.cc:258
20240919 12:10:47.901986 UTC 3549742 DEBUG [LaunchSlotWithData] slot 0 is processing [task id: 0] - llama_server_context.cc:623
20240919 12:10:47.902059 UTC 3549742 INFO kv cache rm [p0, end) - id_slot: 0, task_id: 0, p0: 0 - llama_server_context.cc:1544
20240919 12:10:48.166076 UTC 3549742 DEBUG [PrintTimings] PrintTimings: prompt eval time = 172.433ms / 2 tokens (86.2165 ms per token, 11.5987079039 tokens per second) - llama_client_slot.cc:79
20240919 12:10:48.166081 UTC 3549742 DEBUG [PrintTimings] PrintTimings: eval time = 91.653 ms / 4 runs (22.91325 ms per token, 43.6428703916 tokens per second)

llama_client_slot.cc:86
20240919 12:10:48.166082 UTC 3549742 DEBUG [PrintTimings] PrintTimings: total time = 264.086 ms - llama_client_slot.cc:92
20240919 12:10:48.166116 UTC 3549742 INFO slot released: id_slot: 0, id_task: 0, n_ctx: 4096, n_past: 6, n_system_tokens: 0, n_cache_tokens: 0, truncated: 0 - llama_server_context.cc:1304
20240919 12:10:48.166129 UTC 3549698 INFO {"content":",\nI recently bought","generation_settings":{"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"min_p":0.05000000074505806,"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"/Users/louis/Library/Application Support/Jan/jan/models/llava-7b/llava-v1.6-mistral-7b.Q4_K_M.gguf","n_ctx":4096,"n_keep":0,"n_predict":2,"n_probs":0,"penalize_nl":false,"penalty_prompt_tokens":[],"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.0,"seed":4294967295,"stop":[],"stream":false,"temperature":0.800000011920929,"tfs_z":1.0,"top_k":40,"top_p":0.949999988079071,"typical_p":1.0,"use_penalty_prompt_tokens":false},"model":"/Users/louis/Library/Application Support/Jan/jan/models/llava-7b/llava-v1.6-mistral-7b.Q4_K_M.gguf","prompt":"Hello","slot_id":0,"stop":true,"stopped_eos":false,"stopped_limit":true,"stopped_word":false,"stopping_word":"","timings":{"predicted_ms":91.653,"predicted_n":4,"predicted_per_second":43.64287039158565,"predicted_per_token_ms":22.91325,"prompt_ms":172.433,"prompt_n":2,"prompt_per_second":11.598707903939502,"prompt_per_token_ms":86.2165},"tokens_cached":6,"tokens_evaluated":2,"tokens_predicted":4,"truncated":false} - llama_engine.cc:827
20240919 12:10:48.166183 UTC 3549698 INFO Model loaded successfully: llava-7b - llama_engine.cc:216
20240919 12:10:48.171967 UTC 3549699 INFO Model status responded - llama_engine.cc:259
20240919 12:10:48.175867 UTC 3549700 INFO Request 1, model llava-7b: Generating response for inference request - llama_engine.cc:469
20240919 12:10:48.175871 UTC 3549700 INFO Request 1: Stop words:null
llama_engine.cc:486
20240919 12:10:48.175892 UTC 3549700 INFO Request 1: Base64 image detected - llama_engine.cc:549
20240919 12:10:48.179648 UTC 3549700 INFO Request 1: Streamed, waiting for respone - llama_engine.cc:608
20240919 12:10:48.179692 UTC 3549700 DEBUG [makeHeaderString] send stream with transfer-encoding chunked - HttpResponseImpl.cc:535
20240919 12:10:48.182143 UTC 3549742 DEBUG [LaunchSlotWithData] slot 0 - loaded image - llama_server_context.cc:562
20240919 12:10:48.182156 UTC 3549742 DEBUG [LaunchSlotWithData] slot 0 is processing [task id: 1] - llama_server_context.cc:623
20240919 12:10:48.182167 UTC 3549742 INFO kv cache rm [p0, end) - id_slot: 0, task_id: 1, p0: 0 - llama_server_context.cc:1544
20240919 12:10:52.482469 UTC 3549701 INFO Request 2, model llava-7b: Generating response for inference request - llama_engine.cc:469
20240919 12:10:52.482483 UTC 3549701 INFO Request 2: Stop words:null
llama_engine.cc:486
20240919 12:11:04.251929 UTC 3549702 INFO Program is exitting, goodbye! - processManager.cc:8

2024-09-19T12:11:04.294Z [CORTEX]::Debug: cortex process is terminated
2024-09-19T12:11:04.294Z [CORTEX]::Debug: cortex exited with code: 0
2024-09-19T12:11:04.305Z [CORTEX]::CPU information - 10
2024-09-19T12:11:04.305Z [CORTEX]::Debug: Request to kill cortex
2024-09-19T12:11:04.306Z [CORTEX]::Debug: cortex process is terminated
2024-09-19T12:11:04.307Z [CORTEX]::Debug: Spawning cortex subprocess...
2024-09-19T12:11:04.307Z [CORTEX]::Debug: Spawn cortex at path: /Users/louis/Library/Application Support/Jan/jan/extensions/@janhq/inference-cortex-extension/dist/bin/mac-arm64/cortex-cpp, and args: 1,127.0.0.1,3928
2024-09-19T12:11:04.307Z [CORTEX]::Debug: Cortex engine path: /Users/louis/Library/Application Support/Jan/jan/extensions/@janhq/inference-cortex-extension/dist/bin/mac-arm64
2024-09-19T12:11:04.307Z [CORTEX] PATH: /usr/bin:/bin:/usr/sbin:/sbin::/Users/louis/Library/Application Support/Jan/jan/engines/@janhq/inference-cortex-extension/1.0.17:/Users/louis/Library/Application Support/Jan/jan/extensions/@janhq/inference-cortex-extension/dist/bin/mac-arm64:/Users/louis/Library/Application Support/Jan/jan/extensions/@janhq/inference-cortex-extension/dist/bin/mac-arm64
2024-09-19T12:11:04.410Z [CORTEX]::Debug: Loading model with params {"cpu_threads":10,"vision_model":true,"text_model":false,"ctx_len":2048,"prompt_template":"{system_message}\n### Instruction: {prompt}\n### Response:","llama_model_path":"/Users/louis/Library/Application Support/Jan/jan/models/moondream2-f16.gguf/moondream2-f16.gguf","mmproj":"/Users/louis/Library/Application Support/Jan/jan/models/moondream2-f16.gguf/moondream2-mmproj-f16.gguf","system_prompt":"","user_prompt":"\n### Instruction: ","ai_prompt":"\n### Response:","model":"moondream2-f16.gguf","ngl":100}
2024-09-19T12:11:04.410Z [CORTEX]::Debug: cortex is ready
2024-09-19T12:11:04.419Z [CORTEX]::Debug: 20240919 12:11:04.315010 UTC 3550094 INFO cortex-cpp version: 0.5.0 - main.cc:73
20240919 12:11:04.315589 UTC 3550094 INFO Server started, listening at: 127.0.0.1:3928 - main.cc:78
20240919 12:11:04.315590 UTC 3550094 INFO Please load your model - main.cc:79
20240919 12:11:04.315592 UTC 3550094 INFO Number of thread is:10 - main.cc:86
20240919 12:11:04.411469 UTC 3550098 INFO CPU instruction set: fpu = 0| mmx = 0| sse = 0| sse2 = 0| sse3 = 0| ssse3 = 0| sse4_1 = 0| sse4_2 = 0| pclmulqdq = 0| avx = 0| avx2 = 0| avx512_f = 0| avx512_dq = 0| avx512_ifma = 0| avx512_pf = 0| avx512_er = 0| avx512_cd = 0| avx512_bw = 0| has_avx512_vl = 0| has_avx512_vbmi = 0| has_avx512_vbmi2 = 0| avx512_vnni = 0| avx512_bitalg = 0| avx512_vpopcntdq = 0| avx512_4vnniw = 0| avx512_4fmaps = 0| avx512_vp2intersect = 0| aes = 0| f16c = 0| - server.cc:288
20240919 12:11:04.418604 UTC 3550098 INFO Loaded engine: cortex.llamacpp - server.cc:314
20240919 12:11:04.418615 UTC 3550098 INFO cortex.llamacpp version: 0.1.25 - llama_engine.cc:163
20240919 12:11:04.418638 UTC 3550098 INFO MMPROJ FILE detected, multi-model enabled! - llama_engine.cc:300
20240919 12:11:04.418667 UTC 3550098 INFO Number of parallel is set to 1 - llama_engine.cc:352
20240919 12:11:04.418670 UTC 3550098 DEBUG [LoadModelImpl] cache_type: f16 - llama_engine.cc:365
20240919 12:11:04.418672 UTC 3550098 DEBUG [LoadModelImpl] Enabled Flash Attention - llama_engine.cc:374
20240919 12:11:04.418679 UTC 3550098 DEBUG [LoadModelImpl] stop: null

llama_engine.cc:395
{"timestamp":1726747864,"level":"INFO","function":"LoadModelImpl","line":418,"message":"system info","n_threads":10,"total_threads":10,"system_info":"AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 1 | SVE = 0 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | "}

2024-09-19T12:11:04.420Z [CORTEX]::Error: ggml_metal_init: allocating

2024-09-19T12:11:04.431Z [CORTEX]::Error: ggml_metal_init: found device: Apple M2 Pro

2024-09-19T12:11:04.458Z [CORTEX]::Error: ggml_metal_init: picking default device: Apple M2 Pro

2024-09-19T12:11:04.459Z [CORTEX]::Error: ggml_metal_init: using embedded metal library

2024-09-19T12:11:04.462Z [CORTEX]::Error: ggml_metal_init: GPU name: Apple M2 Pro
ggml_metal_init: GPU family: MTLGPUFamilyApple8 (1008)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001)
ggml_metal_init: simdgroup reduction support = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 22906.50 MB

2024-09-19T12:11:04.841Z [CORTEX]::Error: llama_model_loader: loaded meta data with 19 key-value pairs and 245 tensors from /Users/louis/Library/Application Support/Jan/jan/models/moondream2-f16.gguf/moondream2-f16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = phi2
llama_model_loader: - kv 1: general.name str = moondream2
llama_model_loader: - kv 2: phi2.context_length u32 = 2048
llama_model_loader: - kv 3: phi2.embedding_length u32 = 2048
llama_model_loader: - kv 4: phi2.feed_forward_length u32 = 8192
llama_model_loader: - kv 5: phi2.block_count u32 = 24
llama_model_loader: - kv 6: phi2.attention.head_count u32 = 32
llama_model_loader: - kv 7: phi2.attention.head_count_kv u32 = 32
llama_model_loader: - kv 8: phi2.attention.layer_norm_epsilon f32 = 0.000010
llama_model_loader: - kv 9: phi2.rope.dimension_count u32 = 32
llama_model_loader: - kv 10: general.file_type u32 = 1
llama_model_loader: - kv 11: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 12: tokenizer.ggml.model str = gpt2

2024-09-19T12:11:04.845Z [CORTEX]::Error: llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,51200] = ["!", """, "#", "$", "%", "&", "'", ...

2024-09-19T12:11:04.846Z [CORTEX]::Error: llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,51200] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...

2024-09-19T12:11:04.850Z [CORTEX]::Error: llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,50000] = ["Ġ t", "Ġ a", "h e", "i n", "r e",...
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 50256
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 50256
llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 50256
llama_model_loader: - type f32: 147 tensors
llama_model_loader: - type f16: 98 tensors

2024-09-19T12:11:04.874Z [CORTEX]::Error: llm_load_vocab: missing pre-tokenizer type, using: 'default'
llm_load_vocab:
llm_load_vocab: ************************************
llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED!
llm_load_vocab: CONSIDER REGENERATING THE MODEL
llm_load_vocab: ************************************
llm_load_vocab:

2024-09-19T12:11:04.881Z [CORTEX]::Error: llm_load_vocab: special tokens cache size = 944

2024-09-19T12:11:04.889Z [CORTEX]::Error: llm_load_vocab: token to piece cache size = 0.3151 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = phi2
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 51200
llm_load_print_meta: n_merges = 50000
llm_load_print_meta: vocab_only = 0
llm_load_print_meta: n_ctx_train = 2048
llm_load_print_meta: n_embd = 2048
llm_load_print_meta: n_layer = 24

2024-09-19T12:11:04.889Z [CORTEX]::Error: llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 32
llm_load_print_meta: n_rot = 32
llm_load_print_meta: n_swa = 0
llm_load_print_meta: n_embd_head_k = 64
llm_load_print_meta: n_embd_head_v = 64
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 2048
llm_load_print_meta: n_embd_v_gqa = 2048
llm_load_print_meta: f_norm_eps = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 0.0e+00
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 8192
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 2
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn = 2048
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 1B
llm_load_print_meta: model ftype = F16
llm_load_print_meta: model params = 1.42 B
llm_load_print_meta: model size = 2.64 GiB (16.01 BPW)
llm_load_print_meta: general.name = moondream2
llm_load_print_meta: BOS token = 50256 '<|endoftext|>'
llm_load_print_meta: EOS token = 50256 '<|endoftext|>'
llm_load_print_meta: UNK token = 50256 '<|endoftext|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_print_meta: EOT token = 50256 '<|endoftext|>'
llm_load_print_meta: max token length = 256
llm_load_tensors: ggml ctx size = 0.22 MiB

2024-09-19T12:11:04.890Z [CORTEX]::Error: ggml_backend_metal_log_allocated_size: allocated buffer, size = 2506.30 MiB, ( 3425.89 / 21845.34)
llm_load_tensors: offloading 24 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 25/25 layers to GPU
llm_load_tensors: CPU buffer size = 200.00 MiB
llm_load_tensors: Metal buffer size = 2506.29 MiB

2024-09-19T12:11:04.890Z [CORTEX]::Error: .....................................
2024-09-19T12:11:04.890Z [CORTEX]::Error: .....................
2024-09-19T12:11:04.890Z [CORTEX]::Error: ......................

2024-09-19T12:11:04.892Z [CORTEX]::Error: llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: n_batch = 2048
llama_new_context_with_model: n_ubatch = 2048
llama_new_context_with_model: flash_attn = 1
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
ggml_metal_init: allocating

2024-09-19T12:11:04.893Z [CORTEX]::Error: ggml_metal_init: found device: Apple M2 Pro

2024-09-19T12:11:04.893Z [CORTEX]::Error: ggml_metal_init: picking default device: Apple M2 Pro

2024-09-19T12:11:04.893Z [CORTEX]::Error: ggml_metal_init: using embedded metal library

2024-09-19T12:11:04.894Z [CORTEX]::Error: ggml_metal_init: GPU name: Apple M2 Pro
ggml_metal_init: GPU family: MTLGPUFamilyApple8 (1008)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001)
ggml_metal_init: simdgroup reduction support = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 22906.50 MB

2024-09-19T12:11:04.928Z [CORTEX]::Error: llama_kv_cache_init: Metal KV buffer size = 384.00 MiB
llama_new_context_with_model: KV self size = 384.00 MiB, K (f16): 192.00 MiB, V (f16): 192.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.20 MiB

2024-09-19T12:11:04.929Z [CORTEX]::Error: llama_new_context_with_model: Metal compute buffer size = 416.00 MiB
llama_new_context_with_model: CPU compute buffer size = 32.02 MiB
llama_new_context_with_model: graph nodes = 826
llama_new_context_with_model: graph splits = 2

2024-09-19T12:11:06.399Z [CORTEX]::Debug: Load model success with response {}
2024-09-19T12:11:06.399Z [CORTEX]::Debug: Validating model moondream2-f16.gguf
2024-09-19T12:11:06.400Z [CORTEX]::Debug: Validate model state with response 200
2024-09-19T12:11:06.401Z [CORTEX]::Debug: Validate model state success with response {"model_data":"{"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"min_p":0.05000000074505806,"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"/Users/louis/Library/Application Support/Jan/jan/models/moondream2-f16.gguf/moondream2-f16.gguf","n_ctx":2048,"n_keep":0,"n_predict":2,"n_probs":0,"penalize_nl":false,"penalty_prompt_tokens":[],"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.0,"seed":4294967295,"stop":[],"stream":false,"temperature":0.800000011920929,"tfs_z":1.0,"top_k":40,"top_p":0.949999988079071,"typical_p":1.0,"use_penalty_prompt_tokens":false}","model_loaded":true}
2024-09-19T12:11:06.408Z [CORTEX]::Error: libc++abi: terminating due to uncaught exception of type std::length_error: vector

2024-09-19T12:11:06.408Z [CORTEX]::Debug: 20240919 12:11:04.419177 UTC 3550098 DEBUG [LoadModel] Multi Modal Mode Enabled - llama_server_context.cc:159
20240919 12:11:06.301128 UTC 3550098 DEBUG [Initialize] Available slots: - llama_server_context.cc:225
20240919 12:11:06.301136 UTC 3550098 DEBUG [Initialize] -> Slot 0 - max context: 2048 - llama_server_context.cc:233
20240919 12:11:06.301210 UTC 3550098 INFO Started background task here! - llama_server_context.cc:252
20240919 12:11:06.301254 UTC 3550098 INFO Warm-up model: moondream2-f16.gguf - llama_engine.cc:819
20240919 12:11:06.301257 UTC 3550146 DEBUG [UpdateSlots] all slots are idle and system prompt is empty, clear the KV cache - llama_server_context.cc:1250
20240919 12:11:06.301262 UTC 3550146 DEBUG [KvCacheClear] Clear the entire KV cache - llama_server_context.cc:258
20240919 12:11:06.304526 UTC 3550146 DEBUG [LaunchSlotWithData] slot 0 is processing [task id: 0] - llama_server_context.cc:623
20240919 12:11:06.304589 UTC 3550146 INFO kv cache rm [p0, end) - id_slot: 0, task_id: 0, p0: 0 - llama_server_context.cc:1544
20240919 12:11:06.397659 UTC 3550146 DEBUG [PrintTimings] PrintTimings: prompt eval time = 38.775ms / 1 tokens (38.775 ms per token, 25.7898130239 tokens per second) - llama_client_slot.cc:79
20240919 12:11:06.397667 UTC 3550146 DEBUG [PrintTimings] PrintTimings: eval time = 54.356 ms / 4 runs (13.589 ms per token, 73.5889322246 tokens per second)

llama_client_slot.cc:86
20240919 12:11:06.397668 UTC 3550146 DEBUG [PrintTimings] PrintTimings: total time = 93.131 ms - llama_client_slot.cc:92
20240919 12:11:06.397727 UTC 3550146 INFO slot released: id_slot: 0, id_task: 0, n_ctx: 2048, n_past: 5, n_system_tokens: 0, n_cache_tokens: 0, truncated: 0 - llama_server_context.cc:1304
20240919 12:11:06.397739 UTC 3550098 INFO {"content":", Alien friend! Today","generation_settings":{"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"min_p":0.05000000074505806,"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"/Users/louis/Library/Application Support/Jan/jan/models/moondream2-f16.gguf/moondream2-f16.gguf","n_ctx":2048,"n_keep":0,"n_predict":2,"n_probs":0,"penalize_nl":false,"penalty_prompt_tokens":[],"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.0,"seed":4294967295,"stop":[],"stream":false,"temperature":0.800000011920929,"tfs_z":1.0,"top_k":40,"top_p":0.949999988079071,"typical_p":1.0,"use_penalty_prompt_tokens":false},"model":"/Users/louis/Library/Application Support/Jan/jan/models/moondream2-f16.gguf/moondream2-f16.gguf","prompt":"Hello","slot_id":0,"stop":true,"stopped_eos":false,"stopped_limit":true,"stopped_word":false,"stopping_word":"","timings":{"predicted_ms":54.356,"predicted_n":4,"predicted_per_second":73.58893222459342,"predicted_per_token_ms":13.589,"prompt_ms":38.775,"prompt_n":1,"prompt_per_second":25.78981302385558,"prompt_per_token_ms":38.775},"tokens_cached":5,"tokens_evaluated":1,"tokens_predicted":4,"truncated":false} - llama_engine.cc:827
20240919 12:11:06.397784 UTC 3550098 INFO Model loaded successfully: moondream2-f16.gguf - llama_engine.cc:216
20240919 12:11:06.400552 UTC 3550099 INFO Model status responded - llama_engine.cc:259
20240919 12:11:06.402786 UTC 3550100 INFO Request 1, model moondream2-f16.gguf: Generating response for inference request - llama_engine.cc:469
20240919 12:11:06.402791 UTC 3550100 INFO Request 1: Stop words:[
"<|END_OF_TURN_TOKEN|>",
"<end_of_turn>",
"[/INST]",
"<|end_of_text|>",
"<|eot_id|>",
"<|im_end|>",
"<|end|>"
]
llama_engine.cc:486
20240919 12:11:06.402820 UTC 3550100 INFO Request 1: Base64 image detected - llama_engine.cc:549
20240919 12:11:06.406590 UTC 3550100 INFO Request 1: Streamed, waiting for respone - llama_engine.cc:608
20240919 12:11:06.406633 UTC 3550100 DEBUG [makeHeaderString] send stream with transfer-encoding chunked - HttpResponseImpl.cc:535
20240919 12:11:06.408420 UTC 3550146 DEBUG [LaunchSlotWithData] slot 0 - loaded image - llama_server_context.cc:562
20240919 12:11:06.408434 UTC 3550146 DEBUG [LaunchSlotWithData] slot 0 is processing [task id: 1] - llama_server_context.cc:623
20240919 12:11:06.408442 UTC 3550146 DEBUG [UpdateSlots] slot 0 : we have to evaluate at least 1 token to generate logits - llama_server_context.cc:1496

2024-09-19T12:11:06.409Z [CORTEX]::Debug: cortex exited with code: null

What is your OS?

MacOS
Windows
Linux

louis-jan · 2024-09-19T12:13:28Z

model.json

{
  "object": "model",
  "version": "1.0",
  "format": "gguf",
  "sources": [
    {
      "url": "https://huggingface.co/moondream/moondream2-gguf/resolve/main/moondream2-text-model-f16.gguf",
      "filename": "moondream2-f16.gguf"
    },
    {
      "url": "https://huggingface.co/moondream/moondream2-gguf/resolve/main/moondream2-mmproj-f16.gguf",
      "filename": "moondream2-mmproj-f16.gguf"
    }
  ],
  "id": "moondream2-f16.gguf",
  "name": "Moondream 2",
  "created": 1726572950042,
  "description": "User self import model",
  "settings": {
    "vision_model": true,
    "text_model": false,
    "ctx_len": 2048,
    "prompt_template": "{system_message}\n### Instruction: {prompt}\n### Response:",
    "llama_model_path": "moondream2-f16.gguf",
    "mmproj": "moondream2-mmproj-f16.gguf"
  },
  "parameters": {
    "temperature": 0.7,
    "top_p": 0.95,
    "stream": true,
    "max_tokens": 2048,
    "stop": [
      "<|END_OF_TURN_TOKEN|>",
      "<end_of_turn>",
      "[/INST]",
      "<|end_of_text|>",
      "<|eot_id|>",
      "<|im_end|>",
      "<|end|>"
    ],
    "frequency_penalty": 0,
    "presence_penalty": 0
  },
  "metadata": {
    "author": "User",
    "tags": ["gguf", "region:us"],
    "size": "909777984"
  },
  "engine": "nitro"
}

louis-jan added the type: bug Something isn't working label Sep 19, 2024

louis-jan assigned louis-jan and vansangpfiev Sep 19, 2024

louis-jan changed the title ~~bug: Unable to send image to the Moondream2 Vision model~~ bug: Unable to chat with image using Moondream2 Vision model Sep 19, 2024

imtuyethan added the category: model running label Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: Unable to chat with image using Moondream2 Vision model #3705

bug: Unable to chat with image using Moondream2 Vision model #3705

louis-jan commented Sep 19, 2024 •

edited

Loading

louis-jan commented Sep 19, 2024 •

edited

Loading

bug: Unable to chat with image using Moondream2 Vision model #3705

bug: Unable to chat with image using Moondream2 Vision model #3705

Comments

louis-jan commented Sep 19, 2024 • edited Loading

Jan version

Describe the Bug

Steps to Reproduce

Screenshots / Logs

What is your OS?

louis-jan commented Sep 19, 2024 • edited Loading

louis-jan commented Sep 19, 2024 •

edited

Loading

louis-jan commented Sep 19, 2024 •

edited

Loading