Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Unable to chat with image using Moondream2 Vision model #3705

Open
1 of 3 tasks
louis-jan opened this issue Sep 19, 2024 · 1 comment
Open
1 of 3 tasks

bug: Unable to chat with image using Moondream2 Vision model #3705

louis-jan opened this issue Sep 19, 2024 · 1 comment
Assignees
Labels

Comments

@louis-jan
Copy link
Contributor

louis-jan commented Sep 19, 2024

Jan version

0.5.4

Describe the Bug

I can successfully load the model for chats, but as soon as I send an image, it crashes.
Context:

  • I created a model.json to download the text and CLIP models.
  • I was successfully able to load the model and chat with it.
  • I've attached an image and sent it, but it crashes.

https://huggingface.co/moondream/moondream2-gguf

Same glitch on Linux here.
https://discord.com/channels/1107178041848909847/1285784195125219338/1286348026973261835

Steps to Reproduce

  1. Create a model.json file to interact with the model.
  2. Send an image and request a description.

Screenshots / Logs

Screenshot 2024-09-19 at 19 11 09

2024-09-19T12:11:04.250Z [CORTEX]::Debug: Request to kill cortex
2024-09-19T12:11:04.254Z [CORTEX]::Debug: 20240919 12:10:46.430861 UTC 3549698 DEBUG [LoadModel] Multi Modal Mode Enabled - llama_server_context.cc:159
20240919 12:10:46.676668 UTC 3549698 DEBUG [LoadModel] Request 4096 for context length for llava-1.6 - llama_server_context.cc:170
20240919 12:10:47.890831 UTC 3549698 DEBUG [Initialize] Available slots: - llama_server_context.cc:225
20240919 12:10:47.890848 UTC 3549698 DEBUG [Initialize] -> Slot 0 - max context: 4096 - llama_server_context.cc:233
20240919 12:10:47.890947 UTC 3549698 INFO Started background task here! - llama_server_context.cc:252
20240919 12:10:47.891006 UTC 3549698 INFO Warm-up model: llava-7b - llama_engine.cc:819
20240919 12:10:47.891010 UTC 3549742 DEBUG [UpdateSlots] all slots are idle and system prompt is empty, clear the KV cache - llama_server_context.cc:1250
20240919 12:10:47.891017 UTC 3549742 DEBUG [KvCacheClear] Clear the entire KV cache - llama_server_context.cc:258
20240919 12:10:47.901986 UTC 3549742 DEBUG [LaunchSlotWithData] slot 0 is processing [task id: 0] - llama_server_context.cc:623
20240919 12:10:47.902059 UTC 3549742 INFO kv cache rm [p0, end) - id_slot: 0, task_id: 0, p0: 0 - llama_server_context.cc:1544
20240919 12:10:48.166076 UTC 3549742 DEBUG [PrintTimings] PrintTimings: prompt eval time = 172.433ms / 2 tokens (86.2165 ms per token, 11.5987079039 tokens per second) - llama_client_slot.cc:79
20240919 12:10:48.166081 UTC 3549742 DEBUG [PrintTimings] PrintTimings: eval time = 91.653 ms / 4 runs (22.91325 ms per token, 43.6428703916 tokens per second)

  • llama_client_slot.cc:86
    20240919 12:10:48.166082 UTC 3549742 DEBUG [PrintTimings] PrintTimings: total time = 264.086 ms - llama_client_slot.cc:92
    20240919 12:10:48.166116 UTC 3549742 INFO slot released: id_slot: 0, id_task: 0, n_ctx: 4096, n_past: 6, n_system_tokens: 0, n_cache_tokens: 0, truncated: 0 - llama_server_context.cc:1304
    20240919 12:10:48.166129 UTC 3549698 INFO {"content":",\nI recently bought","generation_settings":{"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"min_p":0.05000000074505806,"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"/Users/louis/Library/Application Support/Jan/jan/models/llava-7b/llava-v1.6-mistral-7b.Q4_K_M.gguf","n_ctx":4096,"n_keep":0,"n_predict":2,"n_probs":0,"penalize_nl":false,"penalty_prompt_tokens":[],"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.0,"seed":4294967295,"stop":[],"stream":false,"temperature":0.800000011920929,"tfs_z":1.0,"top_k":40,"top_p":0.949999988079071,"typical_p":1.0,"use_penalty_prompt_tokens":false},"model":"/Users/louis/Library/Application Support/Jan/jan/models/llava-7b/llava-v1.6-mistral-7b.Q4_K_M.gguf","prompt":"Hello","slot_id":0,"stop":true,"stopped_eos":false,"stopped_limit":true,"stopped_word":false,"stopping_word":"","timings":{"predicted_ms":91.653,"predicted_n":4,"predicted_per_second":43.64287039158565,"predicted_per_token_ms":22.91325,"prompt_ms":172.433,"prompt_n":2,"prompt_per_second":11.598707903939502,"prompt_per_token_ms":86.2165},"tokens_cached":6,"tokens_evaluated":2,"tokens_predicted":4,"truncated":false} - llama_engine.cc:827
    20240919 12:10:48.166183 UTC 3549698 INFO Model loaded successfully: llava-7b - llama_engine.cc:216
    20240919 12:10:48.171967 UTC 3549699 INFO Model status responded - llama_engine.cc:259
    20240919 12:10:48.175867 UTC 3549700 INFO Request 1, model llava-7b: Generating response for inference request - llama_engine.cc:469
    20240919 12:10:48.175871 UTC 3549700 INFO Request 1: Stop words:null
  • llama_engine.cc:486
    20240919 12:10:48.175892 UTC 3549700 INFO Request 1: Base64 image detected - llama_engine.cc:549
    20240919 12:10:48.179648 UTC 3549700 INFO Request 1: Streamed, waiting for respone - llama_engine.cc:608
    20240919 12:10:48.179692 UTC 3549700 DEBUG [makeHeaderString] send stream with transfer-encoding chunked - HttpResponseImpl.cc:535
    20240919 12:10:48.182143 UTC 3549742 DEBUG [LaunchSlotWithData] slot 0 - loaded image - llama_server_context.cc:562
    20240919 12:10:48.182156 UTC 3549742 DEBUG [LaunchSlotWithData] slot 0 is processing [task id: 1] - llama_server_context.cc:623
    20240919 12:10:48.182167 UTC 3549742 INFO kv cache rm [p0, end) - id_slot: 0, task_id: 1, p0: 0 - llama_server_context.cc:1544
    20240919 12:10:52.482469 UTC 3549701 INFO Request 2, model llava-7b: Generating response for inference request - llama_engine.cc:469
    20240919 12:10:52.482483 UTC 3549701 INFO Request 2: Stop words:null
  • llama_engine.cc:486
    20240919 12:11:04.251929 UTC 3549702 INFO Program is exitting, goodbye! - processManager.cc:8

2024-09-19T12:11:04.294Z [CORTEX]::Debug: cortex process is terminated
2024-09-19T12:11:04.294Z [CORTEX]::Debug: cortex exited with code: 0
2024-09-19T12:11:04.305Z [CORTEX]::CPU information - 10
2024-09-19T12:11:04.305Z [CORTEX]::Debug: Request to kill cortex
2024-09-19T12:11:04.306Z [CORTEX]::Debug: cortex process is terminated
2024-09-19T12:11:04.307Z [CORTEX]::Debug: Spawning cortex subprocess...
2024-09-19T12:11:04.307Z [CORTEX]::Debug: Spawn cortex at path: /Users/louis/Library/Application Support/Jan/jan/extensions/@janhq/inference-cortex-extension/dist/bin/mac-arm64/cortex-cpp, and args: 1,127.0.0.1,3928
2024-09-19T12:11:04.307Z [CORTEX]::Debug: Cortex engine path: /Users/louis/Library/Application Support/Jan/jan/extensions/@janhq/inference-cortex-extension/dist/bin/mac-arm64
2024-09-19T12:11:04.307Z [CORTEX] PATH: /usr/bin:/bin:/usr/sbin:/sbin::/Users/louis/Library/Application Support/Jan/jan/engines/@janhq/inference-cortex-extension/1.0.17:/Users/louis/Library/Application Support/Jan/jan/extensions/@janhq/inference-cortex-extension/dist/bin/mac-arm64:/Users/louis/Library/Application Support/Jan/jan/extensions/@janhq/inference-cortex-extension/dist/bin/mac-arm64
2024-09-19T12:11:04.410Z [CORTEX]::Debug: Loading model with params {"cpu_threads":10,"vision_model":true,"text_model":false,"ctx_len":2048,"prompt_template":"{system_message}\n### Instruction: {prompt}\n### Response:","llama_model_path":"/Users/louis/Library/Application Support/Jan/jan/models/moondream2-f16.gguf/moondream2-f16.gguf","mmproj":"/Users/louis/Library/Application Support/Jan/jan/models/moondream2-f16.gguf/moondream2-mmproj-f16.gguf","system_prompt":"","user_prompt":"\n### Instruction: ","ai_prompt":"\n### Response:","model":"moondream2-f16.gguf","ngl":100}
2024-09-19T12:11:04.410Z [CORTEX]::Debug: cortex is ready
2024-09-19T12:11:04.419Z [CORTEX]::Debug: 20240919 12:11:04.315010 UTC 3550094 INFO cortex-cpp version: 0.5.0 - main.cc:73
20240919 12:11:04.315589 UTC 3550094 INFO Server started, listening at: 127.0.0.1:3928 - main.cc:78
20240919 12:11:04.315590 UTC 3550094 INFO Please load your model - main.cc:79
20240919 12:11:04.315592 UTC 3550094 INFO Number of thread is:10 - main.cc:86
20240919 12:11:04.411469 UTC 3550098 INFO CPU instruction set: fpu = 0| mmx = 0| sse = 0| sse2 = 0| sse3 = 0| ssse3 = 0| sse4_1 = 0| sse4_2 = 0| pclmulqdq = 0| avx = 0| avx2 = 0| avx512_f = 0| avx512_dq = 0| avx512_ifma = 0| avx512_pf = 0| avx512_er = 0| avx512_cd = 0| avx512_bw = 0| has_avx512_vl = 0| has_avx512_vbmi = 0| has_avx512_vbmi2 = 0| avx512_vnni = 0| avx512_bitalg = 0| avx512_vpopcntdq = 0| avx512_4vnniw = 0| avx512_4fmaps = 0| avx512_vp2intersect = 0| aes = 0| f16c = 0| - server.cc:288
20240919 12:11:04.418604 UTC 3550098 INFO Loaded engine: cortex.llamacpp - server.cc:314
20240919 12:11:04.418615 UTC 3550098 INFO cortex.llamacpp version: 0.1.25 - llama_engine.cc:163
20240919 12:11:04.418638 UTC 3550098 INFO MMPROJ FILE detected, multi-model enabled! - llama_engine.cc:300
20240919 12:11:04.418667 UTC 3550098 INFO Number of parallel is set to 1 - llama_engine.cc:352
20240919 12:11:04.418670 UTC 3550098 DEBUG [LoadModelImpl] cache_type: f16 - llama_engine.cc:365
20240919 12:11:04.418672 UTC 3550098 DEBUG [LoadModelImpl] Enabled Flash Attention - llama_engine.cc:374
20240919 12:11:04.418679 UTC 3550098 DEBUG [LoadModelImpl] stop: null

  • llama_engine.cc:395
    {"timestamp":1726747864,"level":"INFO","function":"LoadModelImpl","line":418,"message":"system info","n_threads":10,"total_threads":10,"system_info":"AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 1 | SVE = 0 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | "}

2024-09-19T12:11:04.420Z [CORTEX]::Error: ggml_metal_init: allocating

2024-09-19T12:11:04.431Z [CORTEX]::Error: ggml_metal_init: found device: Apple M2 Pro

2024-09-19T12:11:04.458Z [CORTEX]::Error: ggml_metal_init: picking default device: Apple M2 Pro

2024-09-19T12:11:04.459Z [CORTEX]::Error: ggml_metal_init: using embedded metal library

2024-09-19T12:11:04.462Z [CORTEX]::Error: ggml_metal_init: GPU name: Apple M2 Pro
ggml_metal_init: GPU family: MTLGPUFamilyApple8 (1008)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001)
ggml_metal_init: simdgroup reduction support = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 22906.50 MB

2024-09-19T12:11:04.841Z [CORTEX]::Error: llama_model_loader: loaded meta data with 19 key-value pairs and 245 tensors from /Users/louis/Library/Application Support/Jan/jan/models/moondream2-f16.gguf/moondream2-f16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = phi2
llama_model_loader: - kv 1: general.name str = moondream2
llama_model_loader: - kv 2: phi2.context_length u32 = 2048
llama_model_loader: - kv 3: phi2.embedding_length u32 = 2048
llama_model_loader: - kv 4: phi2.feed_forward_length u32 = 8192
llama_model_loader: - kv 5: phi2.block_count u32 = 24
llama_model_loader: - kv 6: phi2.attention.head_count u32 = 32
llama_model_loader: - kv 7: phi2.attention.head_count_kv u32 = 32
llama_model_loader: - kv 8: phi2.attention.layer_norm_epsilon f32 = 0.000010
llama_model_loader: - kv 9: phi2.rope.dimension_count u32 = 32
llama_model_loader: - kv 10: general.file_type u32 = 1
llama_model_loader: - kv 11: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 12: tokenizer.ggml.model str = gpt2

2024-09-19T12:11:04.845Z [CORTEX]::Error: llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,51200] = ["!", """, "#", "$", "%", "&", "'", ...

2024-09-19T12:11:04.846Z [CORTEX]::Error: llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,51200] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...

2024-09-19T12:11:04.850Z [CORTEX]::Error: llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,50000] = ["Ġ t", "Ġ a", "h e", "i n", "r e",...
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 50256
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 50256
llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 50256
llama_model_loader: - type f32: 147 tensors
llama_model_loader: - type f16: 98 tensors

2024-09-19T12:11:04.874Z [CORTEX]::Error: llm_load_vocab: missing pre-tokenizer type, using: 'default'
llm_load_vocab:
llm_load_vocab: ************************************
llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED!
llm_load_vocab: CONSIDER REGENERATING THE MODEL
llm_load_vocab: ************************************
llm_load_vocab:

2024-09-19T12:11:04.881Z [CORTEX]::Error: llm_load_vocab: special tokens cache size = 944

2024-09-19T12:11:04.889Z [CORTEX]::Error: llm_load_vocab: token to piece cache size = 0.3151 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = phi2
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 51200
llm_load_print_meta: n_merges = 50000
llm_load_print_meta: vocab_only = 0
llm_load_print_meta: n_ctx_train = 2048
llm_load_print_meta: n_embd = 2048
llm_load_print_meta: n_layer = 24

2024-09-19T12:11:04.889Z [CORTEX]::Error: llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 32
llm_load_print_meta: n_rot = 32
llm_load_print_meta: n_swa = 0
llm_load_print_meta: n_embd_head_k = 64
llm_load_print_meta: n_embd_head_v = 64
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 2048
llm_load_print_meta: n_embd_v_gqa = 2048
llm_load_print_meta: f_norm_eps = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 0.0e+00
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 8192
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 2
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn = 2048
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 1B
llm_load_print_meta: model ftype = F16
llm_load_print_meta: model params = 1.42 B
llm_load_print_meta: model size = 2.64 GiB (16.01 BPW)
llm_load_print_meta: general.name = moondream2
llm_load_print_meta: BOS token = 50256 '<|endoftext|>'
llm_load_print_meta: EOS token = 50256 '<|endoftext|>'
llm_load_print_meta: UNK token = 50256 '<|endoftext|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_print_meta: EOT token = 50256 '<|endoftext|>'
llm_load_print_meta: max token length = 256
llm_load_tensors: ggml ctx size = 0.22 MiB

2024-09-19T12:11:04.890Z [CORTEX]::Error: ggml_backend_metal_log_allocated_size: allocated buffer, size = 2506.30 MiB, ( 3425.89 / 21845.34)
llm_load_tensors: offloading 24 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 25/25 layers to GPU
llm_load_tensors: CPU buffer size = 200.00 MiB
llm_load_tensors: Metal buffer size = 2506.29 MiB

2024-09-19T12:11:04.890Z [CORTEX]::Error: .....................................
2024-09-19T12:11:04.890Z [CORTEX]::Error: .....................
2024-09-19T12:11:04.890Z [CORTEX]::Error: ......................

2024-09-19T12:11:04.892Z [CORTEX]::Error: llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: n_batch = 2048
llama_new_context_with_model: n_ubatch = 2048
llama_new_context_with_model: flash_attn = 1
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
ggml_metal_init: allocating

2024-09-19T12:11:04.893Z [CORTEX]::Error: ggml_metal_init: found device: Apple M2 Pro

2024-09-19T12:11:04.893Z [CORTEX]::Error: ggml_metal_init: picking default device: Apple M2 Pro

2024-09-19T12:11:04.893Z [CORTEX]::Error: ggml_metal_init: using embedded metal library

2024-09-19T12:11:04.894Z [CORTEX]::Error: ggml_metal_init: GPU name: Apple M2 Pro
ggml_metal_init: GPU family: MTLGPUFamilyApple8 (1008)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001)
ggml_metal_init: simdgroup reduction support = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 22906.50 MB

2024-09-19T12:11:04.928Z [CORTEX]::Error: llama_kv_cache_init: Metal KV buffer size = 384.00 MiB
llama_new_context_with_model: KV self size = 384.00 MiB, K (f16): 192.00 MiB, V (f16): 192.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.20 MiB

2024-09-19T12:11:04.929Z [CORTEX]::Error: llama_new_context_with_model: Metal compute buffer size = 416.00 MiB
llama_new_context_with_model: CPU compute buffer size = 32.02 MiB
llama_new_context_with_model: graph nodes = 826
llama_new_context_with_model: graph splits = 2

2024-09-19T12:11:06.399Z [CORTEX]::Debug: Load model success with response {}
2024-09-19T12:11:06.399Z [CORTEX]::Debug: Validating model moondream2-f16.gguf
2024-09-19T12:11:06.400Z [CORTEX]::Debug: Validate model state with response 200
2024-09-19T12:11:06.401Z [CORTEX]::Debug: Validate model state success with response {"model_data":"{"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"min_p":0.05000000074505806,"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"/Users/louis/Library/Application Support/Jan/jan/models/moondream2-f16.gguf/moondream2-f16.gguf","n_ctx":2048,"n_keep":0,"n_predict":2,"n_probs":0,"penalize_nl":false,"penalty_prompt_tokens":[],"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.0,"seed":4294967295,"stop":[],"stream":false,"temperature":0.800000011920929,"tfs_z":1.0,"top_k":40,"top_p":0.949999988079071,"typical_p":1.0,"use_penalty_prompt_tokens":false}","model_loaded":true}
2024-09-19T12:11:06.408Z [CORTEX]::Error: libc++abi: terminating due to uncaught exception of type std::length_error: vector

2024-09-19T12:11:06.408Z [CORTEX]::Debug: 20240919 12:11:04.419177 UTC 3550098 DEBUG [LoadModel] Multi Modal Mode Enabled - llama_server_context.cc:159
20240919 12:11:06.301128 UTC 3550098 DEBUG [Initialize] Available slots: - llama_server_context.cc:225
20240919 12:11:06.301136 UTC 3550098 DEBUG [Initialize] -> Slot 0 - max context: 2048 - llama_server_context.cc:233
20240919 12:11:06.301210 UTC 3550098 INFO Started background task here! - llama_server_context.cc:252
20240919 12:11:06.301254 UTC 3550098 INFO Warm-up model: moondream2-f16.gguf - llama_engine.cc:819
20240919 12:11:06.301257 UTC 3550146 DEBUG [UpdateSlots] all slots are idle and system prompt is empty, clear the KV cache - llama_server_context.cc:1250
20240919 12:11:06.301262 UTC 3550146 DEBUG [KvCacheClear] Clear the entire KV cache - llama_server_context.cc:258
20240919 12:11:06.304526 UTC 3550146 DEBUG [LaunchSlotWithData] slot 0 is processing [task id: 0] - llama_server_context.cc:623
20240919 12:11:06.304589 UTC 3550146 INFO kv cache rm [p0, end) - id_slot: 0, task_id: 0, p0: 0 - llama_server_context.cc:1544
20240919 12:11:06.397659 UTC 3550146 DEBUG [PrintTimings] PrintTimings: prompt eval time = 38.775ms / 1 tokens (38.775 ms per token, 25.7898130239 tokens per second) - llama_client_slot.cc:79
20240919 12:11:06.397667 UTC 3550146 DEBUG [PrintTimings] PrintTimings: eval time = 54.356 ms / 4 runs (13.589 ms per token, 73.5889322246 tokens per second)

  • llama_client_slot.cc:86
    20240919 12:11:06.397668 UTC 3550146 DEBUG [PrintTimings] PrintTimings: total time = 93.131 ms - llama_client_slot.cc:92
    20240919 12:11:06.397727 UTC 3550146 INFO slot released: id_slot: 0, id_task: 0, n_ctx: 2048, n_past: 5, n_system_tokens: 0, n_cache_tokens: 0, truncated: 0 - llama_server_context.cc:1304
    20240919 12:11:06.397739 UTC 3550098 INFO {"content":", Alien friend! Today","generation_settings":{"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"min_p":0.05000000074505806,"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"/Users/louis/Library/Application Support/Jan/jan/models/moondream2-f16.gguf/moondream2-f16.gguf","n_ctx":2048,"n_keep":0,"n_predict":2,"n_probs":0,"penalize_nl":false,"penalty_prompt_tokens":[],"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.0,"seed":4294967295,"stop":[],"stream":false,"temperature":0.800000011920929,"tfs_z":1.0,"top_k":40,"top_p":0.949999988079071,"typical_p":1.0,"use_penalty_prompt_tokens":false},"model":"/Users/louis/Library/Application Support/Jan/jan/models/moondream2-f16.gguf/moondream2-f16.gguf","prompt":"Hello","slot_id":0,"stop":true,"stopped_eos":false,"stopped_limit":true,"stopped_word":false,"stopping_word":"","timings":{"predicted_ms":54.356,"predicted_n":4,"predicted_per_second":73.58893222459342,"predicted_per_token_ms":13.589,"prompt_ms":38.775,"prompt_n":1,"prompt_per_second":25.78981302385558,"prompt_per_token_ms":38.775},"tokens_cached":5,"tokens_evaluated":1,"tokens_predicted":4,"truncated":false} - llama_engine.cc:827
    20240919 12:11:06.397784 UTC 3550098 INFO Model loaded successfully: moondream2-f16.gguf - llama_engine.cc:216
    20240919 12:11:06.400552 UTC 3550099 INFO Model status responded - llama_engine.cc:259
    20240919 12:11:06.402786 UTC 3550100 INFO Request 1, model moondream2-f16.gguf: Generating response for inference request - llama_engine.cc:469
    20240919 12:11:06.402791 UTC 3550100 INFO Request 1: Stop words:[
    "<|END_OF_TURN_TOKEN|>",
    "<end_of_turn>",
    "[/INST]",
    "<|end_of_text|>",
    "<|eot_id|>",
    "<|im_end|>",
    "<|end|>"
    ]
  • llama_engine.cc:486
    20240919 12:11:06.402820 UTC 3550100 INFO Request 1: Base64 image detected - llama_engine.cc:549
    20240919 12:11:06.406590 UTC 3550100 INFO Request 1: Streamed, waiting for respone - llama_engine.cc:608
    20240919 12:11:06.406633 UTC 3550100 DEBUG [makeHeaderString] send stream with transfer-encoding chunked - HttpResponseImpl.cc:535
    20240919 12:11:06.408420 UTC 3550146 DEBUG [LaunchSlotWithData] slot 0 - loaded image - llama_server_context.cc:562
    20240919 12:11:06.408434 UTC 3550146 DEBUG [LaunchSlotWithData] slot 0 is processing [task id: 1] - llama_server_context.cc:623
    20240919 12:11:06.408442 UTC 3550146 DEBUG [UpdateSlots] slot 0 : we have to evaluate at least 1 token to generate logits - llama_server_context.cc:1496

2024-09-19T12:11:06.409Z [CORTEX]::Debug: cortex exited with code: null

What is your OS?

  • MacOS
  • Windows
  • Linux
@louis-jan louis-jan added the type: bug Something isn't working label Sep 19, 2024
@louis-jan
Copy link
Contributor Author

louis-jan commented Sep 19, 2024

model.json

{
  "object": "model",
  "version": "1.0",
  "format": "gguf",
  "sources": [
    {
      "url": "https://huggingface.co/moondream/moondream2-gguf/resolve/main/moondream2-text-model-f16.gguf",
      "filename": "moondream2-f16.gguf"
    },
    {
      "url": "https://huggingface.co/moondream/moondream2-gguf/resolve/main/moondream2-mmproj-f16.gguf",
      "filename": "moondream2-mmproj-f16.gguf"
    }
  ],
  "id": "moondream2-f16.gguf",
  "name": "Moondream 2",
  "created": 1726572950042,
  "description": "User self import model",
  "settings": {
    "vision_model": true,
    "text_model": false,
    "ctx_len": 2048,
    "prompt_template": "{system_message}\n### Instruction: {prompt}\n### Response:",
    "llama_model_path": "moondream2-f16.gguf",
    "mmproj": "moondream2-mmproj-f16.gguf"
  },
  "parameters": {
    "temperature": 0.7,
    "top_p": 0.95,
    "stream": true,
    "max_tokens": 2048,
    "stop": [
      "<|END_OF_TURN_TOKEN|>",
      "<end_of_turn>",
      "[/INST]",
      "<|end_of_text|>",
      "<|eot_id|>",
      "<|im_end|>",
      "<|end|>"
    ],
    "frequency_penalty": 0,
    "presence_penalty": 0
  },
  "metadata": {
    "author": "User",
    "tags": ["gguf", "region:us"],
    "size": "909777984"
  },
  "engine": "nitro"
}

@louis-jan louis-jan changed the title bug: Unable to send image to the Moondream2 Vision model bug: Unable to chat with image using Moondream2 Vision model Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Scheduled
Development

No branches or pull requests

3 participants