Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug with tokenizer and gibberish output #9

Open
evilsocket opened this issue Jul 16, 2024 · 4 comments
Open

bug with tokenizer and gibberish output #9

evilsocket opened this issue Jul 16, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@evilsocket
Copy link
Owner

the tokenizer has issues resolving a few tokens including special ones (they will be shown in the output as ), which is causing all sorts of gibberish output ... it's probably a matter of parsing the model/tokenizer.json properly

@evilsocket evilsocket self-assigned this Jul 16, 2024
@evilsocket evilsocket added the bug Something isn't working label Jul 16, 2024
@evilsocket
Copy link
Owner Author

the model responds well when prompt template is not used (https://github.com/evilsocket/cake/blob/main/cake-core/src/models/llama3/llama.rs#L266)

@evilsocket
Copy link
Owner Author

this is also happening in candle llama example code -> huggingface/candle#2341

evilsocket added a commit that referenced this issue Jul 18, 2024
@caohuilong
Copy link

Has this been fixed? I met the same problem.

@evilsocket
Copy link
Owner Author

@caohuilong the issue is open, the bug has not been fixed, waiting for the candle team to respond

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants