Oh wow I only just uploaded this and it's on HN already!

I'm actually quite excited about this video because I tried my hardest to pack all the key info I could think of into a 90 minute talk -- the goal is to be the one place I point coders at when they ask "hey tell me everything I need to know about LLMs".

Having said that, I'm sure I missed things or there are bits that are unclear -- this is my first attempt at doing this, and I plan to expand this out into a full course at some point. So please tell me any questions you still have after watching the video, or let me know of any concepts you think I should have covered but didn't.

I'm actually heading to bed shortly (it's getting late here in Australia!) so not sure I'll be able to answer many questions until morning, sorry. But I'll definitely take a look at this page when I get up. I'll also add links to relevant papers and stuff in the YouTube description tomorrow.

(Oh I should mention -- I didn't cover any ethical or policy issues; not because they're not important, but because I decided to focus entirely on technical issues for this talk.)

This was excellent. Here's the notebook that accompanies the video:

I thought the selection of projects was great - some OpenAI API hacking including a Code Interpreter imitation created using OpenAI functions, then some Hugging Face model local LLM execution, and then a fine-tuning example to build a text-to-SQL model somehow crammed into just 10 minutes at the end!

Thanks a lot for this video, best LLM usage tutorial I've seen so far.

At when talking about valid use cases for a local model vs GPT4 is: "You might want to create your own model that's particularly good at solving the kinds of problems that you need to solve using fine tuning, and these are all things that you absolutely can get better than GPT4 performance".

In regards to this, there's an idea I've been thinking about for some time: Imagine a chatbot that is backed by multiple "small" models (such as 7B parameters), where each model is fine tuned for a specific task. Could such a system outperform GPT4?

Here's a high level overview how I imagine this to work:

- Context/prompt is sent to a "router model", which is trained to determine what kind of expert model can best answer/complete the prompt.

- The system then passes the context/prompt to the expert model and returns that answer.

- If no expert model is found, just use a generic instruct tuned general purpose LLM to answer

If you can theoretically get better than GPT4 performance on a small models fine tuned for that task, maybe a cluster of such small models could collectively outperform GPT4.

Does that make sense?

Gems like this make all the time I spend on HN worthwhile. Thank you so much Jeremy Howard, you are a legend!!
Excellent video. I shared it in my workplace. Probably the most comprehensive introduction to the topic from a practical standpoint that I'm aware of. In particular, I loved the "those viral articles about GPT can't do X don't reproduce" section. Hoping it helps folks I know think about how to think critically when considering the tech.
…from the guy whose paper started this whole thing.
Excellent video! Learnt a few new tricks that I'll use in future.

I find just by trying something I discover a new use.

A good example the other day was I needed to convert a spreadsheet of addresses into GeoJSON to use as a map layer. Being in a particularly lazy mood I decided to see how well ChatGPT would handle it.

As a first step I gave it one pair of lat/long and asked it to convert the deg/min to decimal. No problem, showed all the workings.

I then gave it all the whole lat/long column and said not to show workings and it output that fine.

I then created a sample JSON structure with placeholders and said I will provide a data set to populate the structure and to use the column names for replacing the placeholders.

Dropped in the data and it generates the JSON perfectly.

What was interesting is that it redid the lat/long conversion and also incremented an id property I didn't mention without prompting. Was quite impressed with that.

You're a legend, thank you. You're admired all around the world.
This is amazing.

What an explanation. He clearly break downs concepts making it easier to understand

That’s why I love HN also discovering something new

Jeremy is an idol of mine, and as someone born and living in Queensland, a reminder that global talent really does exist all around us.

(Caveat, of course there’s many such people in all domains, Jeremy is simply one of the people I both know of, and admire deeply.)

What a gem! I've been waiting so long for a LLM course by Jeremy. Being one of the ones that help start all of this with his ULMFiT, his takes and tips are as good as I expected. Looking forward for more detailed and bottom level courses when the open source catches up with the proprietary world.
Not a lot of love given to RAG method considering I think for most applications a fine-tuned model in the truest sense won't be the best and most efficient solution to their problem.
As the original author, can you rate this ai generated summary of your video:

Video tutorial on language models by Jeremy Howard from In the tutorial, Howard explains the basics of language models and how to use them in practice. He starts by defining a language model as something that can predict the next word of a sentence or fill in missing words. He demonstrates this using an open AI language model called text DaVinci 003.

Howard explains that language models work by predicting the probability of various possible next words based on the given context. He shows how to use language models for creative brainstorming and playing with different word predictions.

He then discusses language model training and fine-tuning processes, using the ULMfit approach as an example. He explains the three steps of language model training: pre-training, language model fine-tuning, and classifier fine-tuning. He mentions the importance of fine-tuning language models for specific tasks to make them more useful.

Howard also demonstrates how to use the open AI API to access language models programmatically. He shows examples of using the API to generate text, ask questions, perform code interpretation, and even extract text from images using OCR.

Additionally, he discusses the options for running language models on your own computer, such as using GPUs, renting GPU servers, or utilizing cloud platforms like Kaggle and Colab.

He mentions the Transformers library from Hugging Face, which provides pre-trained models and data sets for language processing tasks. He highlights the benefits of fine-tuning models and using retrieval augmented generation to combine document retrieval with language generation.

The tutorial concludes with a discussion on other options for running language models, including using private GPT models, Mac-based solutions like H2O GPT and lima.cpp, and the possibility of fine-tuning models with custom data sets.

Overall, the tutorial provides a comprehensive overview of language models, their applications, and different ways to use them, both with open AI models and on your own computer.

The most exciting thing about LLMs is how they become easier for intermediate programmers every day. It really makes your imagination run wild when you can grasp the concepts.
This is by far the best LLM user tutorial I've seen.
I have never favorited a Hacker News post as quickly as I did this one.
Pretty cool!
For those who want to read more about this: