Instead of PDF, use Markdownload (on iOS, use a Safari web content to markdown file extension):
And save in a journaled folder like "YYYY-MM-DD - Page Title.md" with a YAML frontmatter of all available metadata.
Have this as a folder in your PKM of choice (Obsidian, Foam, whatever).
These days, point some text embedding at it, and let it generate your own LLM brain.
But you can also static-site-generate that back into your own web knowledge site or base.
If you don't need it locally, and depending on the capture you want, consider pinboard.in or historio.us:
Sites die off. Online web archives aren't reliable or completely trustworthy. And censorship of many forms seems to be occurring more and more.
With storage so cheap, there's little downside to saving what you like.
AI search would be the only reason to. If it saves everything automatically and can query references / make inferences seamlessly, then great. Anything less, and my life eats itself like a snake.
The only service I still did not localize is email, secifically Gmail, which I believe Google is imminent to monetize/AI-ize in the very near future.
I just got myself an Nvidia 4090, and I'm looking into using local LLMs to feed my data into (I think this is called retrieval augmented generation?) for various assistant-type use cases.
I'm particularly excited to potentially be able to go through my saved Kindle highlights for multi-novel sci-fi and fantasy series in order to refresh my memory by clarifying key story beats before continuing with the next book.
You're right, of course. I'd like to see a local proxy that caches everything for at least a month, then automatically keeps stuff referenced or revisited.
If you find something worth saving, save it! And don't forget to back up your stuff!
Anyone can save the current content of any http:/https: URL in Wayback Machine, so the question is simply whether IA will be around for the time that you care about.
> It may not make economic sense for the websites to stay up
So, no more WWW?
The only way to own things is to have copies offline. In three or more geographically distant locations.