how can I set parametes in llama(load model,create context,create batch etc) for best performance in Android #9349
Unanswered
FranzKafkaYu
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Recently I have update the Android example from b3130 to b3400,and there are too much changes in Android example.
I have noticed that when we load model there are some changes,orignal example(b3130),there will setup
gpt_params
while the new one won't set up ,and the Performace dropped dramatically.with the same model and same input,the original one inference about 1.5s,while the new one will cost 2.5s.
so what the right parameters to be set up for good performance when we use CPU backend in Android armv8?
any detailed guidance for this?
Beta Was this translation helpful? Give feedback.
All reactions