I have had the habit of saving my favorite stuff locally for a very long time. If it's locally stored, it's always available. If it's online, there's a chance that you'll lose access either temporarily or permanently.
Yes and no.

Instead of PDF, use Markdownload (on iOS, use a Safari web content to markdown file extension):

And save in a journaled folder like "YYYY-MM-DD - Page" with a YAML frontmatter of all available metadata.

Have this as a folder in your PKM of choice (Obsidian, Foam, whatever).

These days, point some text embedding at it, and let it generate your own LLM brain.

But you can also static-site-generate that back into your own web knowledge site or base.

If you don't need it locally, and depending on the capture you want, consider or

I'm not certain what kind of impact AI will have on the marketplace, but it seems like a good idea to store what you value locally regardless of what happens with AI.

Sites die off. Online web archives aren't reliable or completely trustworthy. And censorship of many forms seems to be occurring more and more.

With storage so cheap, there's little downside to saving what you like.

Only if you're a hoarder, or it's career-related and you're meticulous. I find that if I start saving stuff, the drive to collect starts outweighing the value of the collection rapidly.

AI search would be the only reason to. If it saves everything automatically and can query references / make inferences seamlessly, then great. Anything less, and my life eats itself like a snake.

Yes. Even before the AI risks, I have moved everything locally – photos, files, nothing syncs to iCloud/Dropbox/etc. Every once in a while I prepare and order printed photo albums.

The only service I still did not localize is email, secifically Gmail, which I believe Google is imminent to monetize/AI-ize in the very near future.

Yes, had the same realization some months ago. I started building a CLI based tool, smaller in scope, offline first and occasionally online.
I have a slightly different take on this: I save the text that I care about, and have some automation set up to archive the source URL of the text to[1] (which works well enough for me, even if it's not 100% perfect, because I'm only archiving it for the greater context of the highlighted text, which I rarely go back to).

I just got myself an Nvidia 4090, and I'm looking into using local LLMs to feed my data into (I think this is called retrieval augmented generation?) for various assistant-type use cases.

I'm particularly excited to potentially be able to go through my saved Kindle highlights for multi-novel sci-fi and fantasy series in order to refresh my memory by clarifying key story beats before continuing with the next book.


This is an old idea... when it was first proposed by Bush[1], the media to record on was microfilm, not PDF files. It's never been implemented, and forces that believe in "intellectual property" are aligned against doing so. (One of the main features was the ability to dump a selected "trail" through documents to microfilm for others)

You're right, of course. I'd like to see a local proxy that caches everything for at least a month, then automatically keeps stuff referenced or revisited.


Absolutely. The internet has proven itself to be ephemeral. The only part of it that is guaranteed is now. Content can be silently changed. Posts get deleted. Links break and 404. Images get lost. Sites put up paywalls, or go down entirely if the owners go bankrupt.

If you find something worth saving, save it! And don't forget to back up your stuff!

> It might be saved in the internet archive, or it might not

Anyone can save the current content of any http:/https: URL in Wayback Machine, so the question is simply whether IA will be around for the time that you care about.

> It may not make economic sense for the websites to stay up

So, no more WWW?

The answer is a resounding yes. The corporate cloud is fake. A mirage. A timebomb whose chance of going off nears 100% over time. When it goes off your digital belongings are gone.

The only way to own things is to have copies offline. In three or more geographically distant locations.

I agree. Instapaper (phone app) is a good tool for doing this. But pdfs are probably more “open” in that you know the format and can choose where to put the files. Internet archive sometimes saves dead links though.
I keep my stuff on Dropbox and Github and it has been working for a while.