I'm actually quite excited about this video because I tried my hardest to pack all the key info I could think of into a 90 minute talk -- the goal is to be the one place I point coders at when they ask "hey tell me everything I need to know about LLMs".
Having said that, I'm sure I missed things or there are bits that are unclear -- this is my first attempt at doing this, and I plan to expand this out into a full course at some point. So please tell me any questions you still have after watching the video, or let me know of any concepts you think I should have covered but didn't.
I'm actually heading to bed shortly (it's getting late here in Australia!) so not sure I'll be able to answer many questions until morning, sorry. But I'll definitely take a look at this page when I get up. I'll also add links to relevant papers and stuff in the YouTube description tomorrow.
(Oh I should mention -- I didn't cover any ethical or policy issues; not because they're not important, but because I decided to focus entirely on technical issues for this talk.)
I thought the selection of projects was great - some OpenAI API hacking including a Code Interpreter imitation created using OpenAI functions, then some Hugging Face model local LLM execution, and then a fine-tuning example to build a text-to-SQL model somehow crammed into just 10 minutes at the end!
At https://youtu.be/jkrNMKz9pWU?si=Dvz-Hs4InJXNozhi&t=3278 when talking about valid use cases for a local model vs GPT4 is: "You might want to create your own model that's particularly good at solving the kinds of problems that you need to solve using fine tuning, and these are all things that you absolutely can get better than GPT4 performance".
In regards to this, there's an idea I've been thinking about for some time: Imagine a chatbot that is backed by multiple "small" models (such as 7B parameters), where each model is fine tuned for a specific task. Could such a system outperform GPT4?
Here's a high level overview how I imagine this to work:
- Context/prompt is sent to a "router model", which is trained to determine what kind of expert model can best answer/complete the prompt.
- The system then passes the context/prompt to the expert model and returns that answer.
- If no expert model is found, just use a generic instruct tuned general purpose LLM to answer
If you can theoretically get better than GPT4 performance on a small models fine tuned for that task, maybe a cluster of such small models could collectively outperform GPT4.
Does that make sense?
I find just by trying something I discover a new use.
A good example the other day was I needed to convert a spreadsheet of addresses into GeoJSON to use as a map layer. Being in a particularly lazy mood I decided to see how well ChatGPT would handle it.
As a first step I gave it one pair of lat/long and asked it to convert the deg/min to decimal. No problem, showed all the workings.
I then gave it all the whole lat/long column and said not to show workings and it output that fine.
I then created a sample JSON structure with placeholders and said I will provide a data set to populate the structure and to use the column names for replacing the placeholders.
Dropped in the data and it generates the JSON perfectly.
What was interesting is that it redid the lat/long conversion and also incremented an id property I didn't mention without prompting. Was quite impressed with that.
What an explanation. He clearly break downs concepts making it easier to understand
That’s why I love HN also discovering something new
(Caveat, of course there’s many such people in all domains, Jeremy is simply one of the people I both know of, and admire deeply.)
Video tutorial on language models by Jeremy Howard from fast.ai. In the tutorial, Howard explains the basics of language models and how to use them in practice. He starts by defining a language model as something that can predict the next word of a sentence or fill in missing words. He demonstrates this using an open AI language model called text DaVinci 003.
Howard explains that language models work by predicting the probability of various possible next words based on the given context. He shows how to use language models for creative brainstorming and playing with different word predictions.
He then discusses language model training and fine-tuning processes, using the ULMfit approach as an example. He explains the three steps of language model training: pre-training, language model fine-tuning, and classifier fine-tuning. He mentions the importance of fine-tuning language models for specific tasks to make them more useful.
Howard also demonstrates how to use the open AI API to access language models programmatically. He shows examples of using the API to generate text, ask questions, perform code interpretation, and even extract text from images using OCR.
Additionally, he discusses the options for running language models on your own computer, such as using GPUs, renting GPU servers, or utilizing cloud platforms like Kaggle and Colab.
He mentions the Transformers library from Hugging Face, which provides pre-trained models and data sets for language processing tasks. He highlights the benefits of fine-tuning models and using retrieval augmented generation to combine document retrieval with language generation.
The tutorial concludes with a discussion on other options for running language models, including using private GPT models, Mac-based solutions like H2O GPT and lima.cpp, and the possibility of fine-tuning models with custom data sets.
Overall, the tutorial provides a comprehensive overview of language models, their applications, and different ways to use them, both with open AI models and on your own computer.