If so, let say in the future, we have an LLM with 100K token context windows but with a subsystem where it notices some knowledge keeps being repeated in the context and then store that knowledge for finetuning when the LLM is not doing inference. Basically a mirror of the way we human work? Is that possible? An LLM that constantly improved and can adapt to new knowledge?
GPT4 is practically unusable unless you spend $10k+ a month and have an enterprise account.
No real end user wants to wait 20-40 seconds for a response, only to find it was 80% of what they wanted.
The hard question IMO is the question of when does it make sense to fine tune an LLM to update it's knowledge and how much data is needed in this case? I have not seen anyone show a real example of succeeded in this case and wonder if it's close to as difficult as training the LLM from scratch or if it's a feasible fine tuning use case.