@simon I wonder what to make of an `llm chat` (local model) answer that abruptly stops. I got relatively consistent results by following it with "continue", but it stops again and there's an ensuing ping pong match.
Do you know what's happening? Maybe it's related to the token limit?
(In this case it's the Llama 3 8B Instruct.)
@docbibi might be that there's a max_token setting that defaults to a shorter value too, which plugin are you using?
@simon I'm afraid only `llm-gpt4all` (v0.4), `llm` is also up-to-date (V0.13.1).
@docbibi here's the problem: I have that defaulting to 200 max tokens! That's a bad default https://github.com/simonw/llm-gpt4all/blob/363559a3accd49c5c0757a1bc843e0376c902bf2/llm_gpt4all.py#L76
@simon ah, thanks for figuring it out! Indeed, adding e.g. `-o max_tokens 2000` does help.
I think the problem is also about adding discoverability to the options.