A very nice description of Erlang/Elixir resilience.

> Fundamentally, if there is nothing useful your code can do on a failure, no mitigation, no meaningful fallback, then you might as well have it blow up and rely on the underlying tree to make whatever is most useful out of it.

I would add this is possible only because process heaps are isolated. Other systems with traditional threads/co-routines/green threads can mimic supervision trees, but unless they have isolated heaps and can safely crash at any time never affecting any other processes, it would be hard to achieve the same safety properties.

> Scheduling and preempting was intended to produce consistently low latencies but it also protects you from heavy workloads bogging things down, it prevents infinitely looping bugs from slowing the system to a crawl and in general makes things resilient to things not being ideal all the time.

Well put. Reliability was one of the initial requirements of Erlang, but soft real-time properties was another one. This is hard to do, but BEAM VM does a great job there. To get this right the VM has to be able to preempt processes, even if they run a tight CPU bound loop. A misbehaving process may not just crash, it could just endlessly recurse, burning CPU cycles, but still shouldn't bring the system down.

> The strategies are normally one_for_one, one_for_all and one_for_rest

Minor correction, the last one is rest_for_one https://www.erlang.org/doc/man/supervisor.html#supervision-p...

One of the great things about the Elixir ecosystem is that, in so many cases, the libraries you’re using make the appropriate use of supervision trees so you don’t have to.

Obviously it depends on what you’re building and you can craft your own supervision structures if you want to. But most of the time you can get the benefits of BEAM resiliency without having to drop down into the details.

The thing about Elixir resilience is that it takes experience to fully appreciate what it does and realize that in a web app, the actual use case when this becomes important is not that often to be totally honest.

I can give an example where this could be useful, I once worked on a node-app that couldn't use clustering mode because reasons and we only had 1 server running this particular piece of code. It turned out that we had a bug that made the node app crash whenever a user posted some invalid data. The problem here was that the app itself was a SPA and the users could just keep on posting even if it failed and they did in frustration.

So then app crashed and took a few seconds to reload, then crashed again. This mean that the entire api went down while the user was posting and thus could not respond to any other requests. This would never happen in Elixir and the load would just continue being ok even if 100 users at the same time would keep posting bad data.

The bad thing about Elixir resilience is that it is only applied to application logic. The rest of the time shit can go wrong is just the same as any other app since most elixir projects use the same kind of tooling (postgres, some web server in front etc). Not that many seems to use the built in mnesia database, no downtime deployment etc. The BEAM comes with many cool feature in theory but very few actually utilizes them so this 99.99999% uptime rarely comes into effect. The amount of time I've had downtime on apps because of things in the application logic like the first story I mentioned has been very, very few and most of the time it's something else entirely and that thing does Elixir not really help with most of the time.

Sure you could utilize all the cool features of the BEAM but it seems like in the absolute majority of cases the amount of work is simply too great for it to be worth the time investment required.

Nice article! FTA: "One for one indicates that if a child crashes it, it and only it should be considered when restarting." What is "it" in this context?
What I also like about Elixir is, that it is a sort of nice language. What I mean by that is, that it has convenient language features like tail-call optimization, immutable by default data structures, modules (seemingly proper ones, but I am still learning, so I still need to check things like "Can I limit what a module exports?"), and great pattern matching. Then of course there is the heritage from building on top of BeamVM and Erlang: Lightweight processes. I also prefer having `do` and `end` markers for things, so that one can write expressions inline, like `fn args -> foobar end`. Having such markers enables structural code editing and things like putting the point at the `do` and then by one keyboard shortcut select everything until `end`.
Partially related question:

There was an article about the implementation of supervisors and you can’t use the “let is crash philosophy” in the whole erlang/beam stack but there is a small (loc) implementation in C which has to be proven safely in order for the “let it crash philosophy” to work.

I don’t know if this was the definition and implementation of a supervisor itself or a process itself.

Does anybody know what I mean and which article I am referring to? If so, I would be glad if you could post a link to the article

Elixir seems to be picking up insane steam right now. Every day or two there is a fascinating Elixir post here and its promise seems too good to resist. Has anyone else latched their cart onto this horse?