https://www.pgcasts.com/episodes/the-skip-locked-feature-in-...
It’s not “web scale” but it easily extends to several thousand background jobs in my experience
But I think a lot of it is also about knowledge and documentation. If I want to copy FAANG or another startup, and set up an infinitely scalable queue-based architecture, I can find dozens of high quality guides, tutorials, white papers etc, showing me exactly how to do it. Yes maintenance is higher, but I can get set up with redis, SQS, any of the 'scalable' solutions within a few hours of copy-pasting commands and code and configuration from a reputable source.
If I want to use NOTIFY in postgres? I googled "SQLALchemy notify listen postgres" and I find a few unanswered stackoverflow questions and a github gist that has some code but no context.
I would honestly love to use this approach for a side project, but I don't have 2-3 days to figure it out on my own. The direct choice for me might seem to be
* simple, but not scalable (ie just use postgres)
* complex, but scalable (ie redis, sqs, whatever)
and then it's a tradeoff, and the argument goes that I am blinded by cool tech and FAANG and I'm choosing complex but scalable, even though I don't need scalable.
But taking into account guides and other resources, the choice for me is actually
* complex and not scalable (this, because I don't know how to implement it and I can't predict what pitfalls I might face if I try)
* simple and scalable (what everyone actually does)
and that makes the engineer's choice to follow faang look a lot more reasonable.
1. The main downside to using PostgreSQL as a pub/sub bus with LISTEN/NOTIFY is that LISTEN is a session feature, making it incompatible with statement level connection pooling.
2. If you are going to do this use advisory locks [0]. Other forms of explicit locking put more pressure on the database while advisory locks are deliberately very lightweight.
My favorite example implementation is que [1] which is ported to several languages.
[0] https://www.postgresql.org/docs/current/explicit-locking.htm...
Having transactions is quite handy.
https://wiki.postgresql.org/wiki/SkyTools
I did a few talks on this at Sydpy as I used it at work quite a bit. It's handy when you already have postgresql running well and supported.
This said, I'd use a dedicated queue these days. Anything but RabbitMQ.
That is, say you have a relatively complex backend mutation that needs to schedule some async work (eg sending an email after signup). With a Postgres queue, if you insert the job to send the email and then in a later part of the transaction, something fails and the transaction rollbacks, the email is never queued to be sent.
Unfortunately, Kafka carries with it enough complexity (due to the distributed nature) that it ends up not being worth it for most use-cases.
Personally I'd love something similar that's easier to operate. You'd probably be able to handle hundreds (if not thousands) of events per second on a single node, and without distributed complexity it'd be really nice.
And yes, in theory you could still use postgres for this (and just never delete rows). And maybe that's the answer.
It seems foolish. I am a big fan of “use the dumbest tool”, but sometimes engineers take it too far and you’re left with the dumbest tool with caveats that don’t seem worth it given the mainstream alternative is relatively cheap and simple.
Having transactional semantics around background jobs is incredibly convenient for things like scheduling email only if the transaction is successful, and so on.
You do need to do a little bit of autovacuum tuning, but once sorted it’s been great for us.
The industry standard for real-time was anything under 30s end-to-end. We hit many scaling limits as our platform grew. We worked on improving the system instead of looking at postgresql as a problem. We measured and measured and postgresql was never the problem when it came to scaling. Most of the problems that existed was mostly due to design decisions in the system or some technical debt that needed to be solved. When we solved those problems the system flew.
There was one incident where our PostgreSQL instance CPU usage went up to 100% during peak hours and would cause problems in the system. We wondered why, there was a lot of debate around the topic, PostgreSQL doesn't scale, we need a larger instance, or move to a dedicated queue system etc...
I didn't believe the hypothesis that PostgreSQL didn't scale. So I conducted a performance audit on the system myself I discovered one of the engineers forgot to add an index on one of the columns on a major table with > 60m records. There was a lot of back and forth between engineers that discussed whether we should add the index. I simply conducted a small experiment and PG analyze clearly showed a missing index in one of the key tables. After we added a single line of code and deployed the index, during peak hours DB CPU usage would not even exceed 20% we had to scale down our DB to save some $$.
PostgreSQL backed queue system does work.
I've been building distributed systems in large and small scales for over a decade. Before building systems with Elixir / Phoenix I also used to work with Rails where the default is as mentioned in this article is you just adopt Redis. While Redis does have additional overhead when it comes to management of the infrastructure, it also worked well.
Having been through both I'd say a simple PostgreSQL backed solution is the clear winner when it came to not having to manage a 3rd dependency when you already have a complex system. There was no 3rd or 4th piece of the puzzle to theorize hypothesize or experiment with. You just had 2 pieces it is either your application or your database that's the problem. That simplicity saved us a lot of time for debate and doing experiments also squashed any dogmatic values and opinions that just didn't hold any water.
My guess is that many people are implementing queuing mechanisms just for sending email.
You can see how this works in Arnie SMTP buffer server, a super simple queue just for emails, no database at all, just the file system.
Do not use SKIP LOCKED unless it is a toy/low throughout.
Row locks require transactions and disk writes.
Advisory locks require neither. (However, you do have to stay inside the configurable memory budget.)
At some point you may want to refactor things to reduce tech debt, but it really is a "and" rather than "or" decision.
I used it for a web automation system for an accounting client (automatically read files from a network share, lookup the clients on a database, submit the documents to government websites, using headless browsers, and put the resulting files in the directory). It allows for completely effortless deterministic programs that call workers that run the non deterministic code, with built in configurable retries (react to certain exception type, exponential back off) so you can write code that works almost like there were no issues with api connections, filesystem, etc.
This code has been running for 5 or more years, with barely any maintenance, with 0 issues so far. It keeps everything in postgres, so even full reboots and crashes have no impact, it will just move the work back to the queue and it will run when there's an available worker.
QC (and equivs) use the same db, and same connection, so same transaction. Saves quite a bit of cruft.
I don't think that there's anything wrong with using a database as a queue, however, I think that there probably could have been better ways to get across the idea, rather than just dismissing an honest opinion as BS. I don't necessarily agree with all of what was said there, but at the same time I can see why those arguments would be reasonable: https://news.ycombinator.com/item?id=20022572
For example:
> Because it is hacky from the perspective of a distributed system architecture. It's coupling 2 components that probably ought not be coupled because it's perceived as "convenient" to do so. The idea that your system's control and data planes are tightly coupled is a dangerous one if your system grows quickly.
To me, this makes perfect sense, if you're using the same database instance for the typical RDBMS use case AND also for the queue. Then again, that could be avoided by having separate database instances/clusters and treating those as separate services: prod-app-database and prod-queue-database.
That said, using something like RabbitMQ or another specialized queue solution might also have the additional benefit of bunches of tutorials and libraries, as well as other resources available, which is pretty much the case whenever you have a well known and a more niche technology, even when the latter might be in some ways better! After all, there is a reason why many would use Sidekiq, resque, rq, Hangfire, asynq and other libraries that were mentioned and already have lots of content around them.
Though whether the inherent complexity of the system or the complexity of your code that's needed to integrate with it is more important, is probably highly situational.
I thought it was an interesting article, and I'd love to hear more from people using PG for queues in production (my intuition would say you'd get a lot of table bloat and/or vacuum latency, but I haven't tested it myself), but when it comes to the conclusion - "choosing boring technology should be one’s default choice" - I can't think of anything more boring (in a good sense, mostly) than Sidekiq + Redis for a Rails app.
Oban's been great, especially if you pay for Web UI and Pro for the extra features [3]
The main issue we've noticed though is that due to its simple fetching mechanism using locks, jobs aren't distributed evenly across your workers due to the greedy `SELECT...LIMIT X` [2]
If you have long running and/or resource intensive jobs, this can be problematic. Lets say you have 3 workers with a local limit of 10 per node. If there are only 10 jobs in the queue, the first node to fetch available jobs will grab and lock all 10, with the other 2 nodes sitting idle.
[1] https://github.com/sorentwo/oban [2] https://github.com/sorentwo/oban/blob/main/lib/oban/engines/... [3] https://getoban.pro/#feature-comparison
The code that does this is maybe 100 lines at most. It’s very effective especially if you deploy your app in kubernetes where you can expect instances to be ephemeral. It’s one of the components of my apps that has never needed any updates since I first wrote it circa 2017.
Python has Celery, but maybe the author is looking for more choice between brokers. https://docs.celeryq.dev/en/stable/index.html
I wish the industry was even half as concerned with efficiency as it was with scale. Bitcoin? Electron? 5MB web pages? 5/10/25GB downloads to run 20yr old CD-sized games on modern software?
On the other hand, I have done a few experiments with postgres LISTEN/NOTIFY, while the feature seems nice at first glance, I concluded that it wasn't worth it for our use cases, maybe it is different in other languages but in the JVM, you have to allocate 1-thread for polling these results, which also keeps a connection busy.
What I ended up doing is leveraging akka-stream to stream the queue data directly from the db, which makes it simple to define throttling rules, this is super simple and effective.
Transactionally Staged Job Drains in Postgres - https://brandur.org/job-drain
It's about the challenge of matching up transactions with queues - where you want a queue to be populated reliably if a transaction completes, and also reliably NOT be populated if it doesn't.
Brandur's pattern is to have an outgoing queue in a database table that gets updated as part of that transaction, and can then be separately drained to whatever queue system you like.
A lot of frameworks already have queue/job libraries with adapters (so you're not really locked in) and cloud providers are highly scalable and fault tolerant.
It seems silly to try to build into Postgres something that is already cheap and readily available unless you find yourself in a situation where standing up additional infra is hard (embedded, certain on premise)
https://github.com/wakatime/wakaq
We currently process ~20 million tasks per day, and I don't have to worry about running VACUUM on my queue ;)
* Metrics, monitoring, alarming on depth, message age
* Autoscaling on your custom metrics
* Managing retries, dead lettering, backoff
* Managing the DB workload: it's update-heavy and may easily be more intensive than the rest of your app. You may need to repeatedly scale your tiny, startup-scale DB, causing customer disruptions, because of your homemade queue.
The arguments for it are either avoidance of lock-in, or the hand-wavy one in this article/many comments: "we're a small startup, we need to minimize complexity, and make things as simple as possible, maintenance will kill us!".
Lock-in makes sense (though other queue alternatives should still be considered), but the latter argument gets used to justify all kinds of harebrained, superficially-simple but actually-complicated engineering schemes. I generally put the PG skip locked approach in that bucket, particularly when the alternative on hand is Pubsub or SQS. If it's between a Postgres table and ActiveMQ I might feel more conflicted.
Especially given the emphasis on YAGNI, you don’t need a UUID primary key, and all of its problems they bring for B+trees (that thing RDBMS is built on), nor do you need the collision resistance of SHA256 - the odds of you creating a dupe job hash with MD5 are vanishingly small.
As to the actual topic, it’s fine IFF you carefully monitor for accumulating dead tuples, and adjust auto-vacuum for that table as necessary. While not something you’d run into at the start, at a modest scale you may start to see issues. May. You may also opt to switch to Redis or something else before that point anyway.
EDIT: if you choose ULID, UUIDv7, or some other k-sortable key, the problem isn’t nearly as bad, but you still don’t need it in this situation. Save yourself 8 bytes per key.
UPDATE ... SET status = 'locked' ... RETURNING message_id
Or you can just use an IMMEDIATE transaction, SELECT the next message ID to retrieve, and UPDATE the row.On top of that, if you want to be extra safe, you can do:
UPDATE Queue SET status = 'locked' WHERE status = 'ready' AND message_id = '....'
To make sure you that the message you are trying to retrieve hasn't been locked already by another worker.[0]: https://github.com/litements/litequeue/
[1]: https://github.com/litements/litequeue/blob/3fece7aa9e9a31e4...
Could it be because microservices and so-called "server-less" have been sold as cost-saving measures that increase the business' flexibility and decrease capital investment?
When... in reality a single deployed Docker container is way more manageable than a distributed system constructed with "lambdas" and requires fewer engineers in the long run?
What I'm trying to say is that FAANG cargo-culting is only part of why developers choose to build solutions that scale larger than they need. Another large part is the cloud development ecosystem writ-large and the consulting culture that has built up around it.
This is an output for our oldest instance (legacy system running Ubuntu 12)
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
stats
OK 952
---
current-jobs-urgent: 0
current-jobs-ready: 765
current-jobs-reserved: 2
current-jobs-delayed: 946
current-jobs-buried: 0
cmd-put: 1188640739
cmd-peek: 2718986
cmd-peek-ready: 5052
cmd-peek-delayed: 797
cmd-peek-buried: 797
cmd-reserve: 0
cmd-reserve-with-timeout: 3245006799
cmd-delete: 1188639093
cmd-release: 43276760
cmd-use: 1137988211
cmd-watch: 166122
cmd-ignore: 166077
cmd-bury: 0
cmd-kick: 1
cmd-touch: 0
cmd-stats: 1260
cmd-stats-job: 43134828
cmd-stats-tube: 53942209
cmd-list-tubes: 3251625
cmd-list-tube-used: 0
cmd-list-tubes-watched: 0
cmd-pause-tube: 988
job-timeouts: 14084
total-jobs: 1188640739
max-job-size: 1048576
current-tubes: 44
current-connections: 63
current-producers: 20
current-workers: 47
current-waiting: 41
total-connections: 14996583
pid: 3959
version: 1.4.6
rusage-utime: 170303.331293
rusage-stime: 399435.543161
uptime: 321658179
binlog-oldest-index: 90539
binlog-current-index: 90983
binlog-max-size: 10485760
I built a complete implementation in Python designed to work the same as SQS but be more simple:
https://github.com/starqueue/starqueue
Alternatively if you just want to quickly hack something into your application, here is a complete solution in one Python function with retries (ask ChatGPT to tell you what the table structure is):
import psycopg2
import psycopg2.extras
import random
db_params = {
'database': 'jobs',
'user': 'jobsuser',
'password': 'superSecret',
'host': '127.0.0.1',
'port': '5432',
}
conn = psycopg2.connect(**db_params)
cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
def do_some_work(job_data):
if random.choice([True, False]):
print('do_some_work FAILED')
raise Exception
else:
print('do_some_work SUCCESS')
def process_job():
sql = """DELETE FROM message_queue
WHERE id = (
SELECT id
FROM message_queue
WHERE status = 'new'
ORDER BY created ASC
FOR UPDATE SKIP LOCKED
LIMIT 1
)
RETURNING *;
"""
cur.execute(sql)
queue_item = cur.fetchone()
print('message_queue says to process job id: ', queue_item['target_id'])
sql = """SELECT * FROM jobs WHERE id =%s AND status='new_waiting' AND attempts <= 3 FOR UPDATE;"""
cur.execute(sql, (queue_item['target_id'],))
job_data = cur.fetchone()
if job_data:
try:
do_some_work(job_data)
sql = """UPDATE jobs SET status = 'complete' WHERE id =%s;"""
cur.execute(sql, (queue_item['target_id'],))
except Exception as e:
sql = """UPDATE jobs SET status = 'failed', attempts = attempts + 1 WHERE id =%s;"""
# if we want the job to run again, insert a new item to the message queue with this job id
cur.execute(sql, (queue_item['target_id'],))
else:
print('no job found, did not get job id: ', queue_item['target_id'])
conn.commit()
process_job()
cur.close()
conn.close()
I’ve experimented with making this easier via libraries that provide high-level APIs for using Postgres as a queue and manage the schemas, listen/notify, etc for you: https://github.com/adriangb/pgjobq
Taking on problems you don't (and will never) have because some vanishingly small minority has experienced them is nuts. Over-engineering is as incorrect as under-engineering. The correctly sized, correctly complicated answer is what we're after.
A small nitpick. FOR UPDATE SKIP LOCKED was mainstream (in a sense) way before it was copied from Oracle to Postgres. It's used under the hood in most non-trivial Oracle deployments. Like for replication or refreshing materialized views in parallel.
Been meaning to build an Orleans stream provider for Postgres.. I believe that's the main missing component that would allow everything to "JustWork" with Postgres until you outgrow it.
Anyone know if there are Celery or Celery-like tools that support Postgres as a broker?
As a side-note, if you want a simple no-frills task scheduler ap-scheduler is a dead simple option. It’s even more limited than the solution described in OP (you can only run one worker so it’s not distributed at all) but often it is all you need especially for toy projects.
https://github.com/bensheldon/good_job
Had it in production for about a quarter and it’s worked well.
I offer this correction to the author: it is "disabuse" not "disimbue".
are you queuing jobs, or are you queuing messages?
that's a fuzzy distinction, so somewhat equivalently, what's the expected time it takes for a worker to process a given queue item?
at one end, an item on the queue may take several seconds to a minute or longer to process. at the other end, an item might take only a few milliseconds to process. in that latter case, it's often useful to do micro-batching, where a single worker pulls 100 or 1000 items off the queue at once, and processes them as a batch (such as by writing them to a separate datastore)
the "larger" the items are (in terms of wall-clock processing time, not necessarily in terms of size in bytes of the serialized item payload) the more effective the database-as-a-queue solution is, in my experience.
as queue items get smaller / shorter to process, and start to feel more like "messages" rather than discrete "jobs", that's when I tend to reach for a queue system over a database.
for example, there's a RabbitMQ blog post [0] on cluster sizing where their recommendations start at 1000 messages/second. that same message volume on a database-as-a-queue would require, generally speaking, 3000 write transactions per second (if we assume one transaction to enqueue the message, one for a worker to claim it, and one for a worker to mark it as complete / delete it).
can Postgres and other relational databases be scaled & tuned to handle that write volume? yes, absolutely. however, how much write volume are you expecting from your queue workload, compared to the write volume from its "normal database" workload? [1]
I think that ends up being a useful heuristic when deciding whether or not to use a database-as-a-queue - will you have a relational database with a "side gig" of acting like a queue, or will you have a relational database that in terms of data volume is primarily acting like a queue, with "normal database" work relegated to "side gig" status?
0: https://blog.rabbitmq.com/posts/2020/06/cluster-sizing-and-o...
1: there's also a Postgres-specific consideration here where a lot of very short-lived "queue item" database rows can put excessive pressure on the autovacuum system.
One endpoint accepts work to a named queue, writes it to a file in an XFS directory. Another locks a mutex, moves the file to an in progress directory and unlocks the mutex before passing the content to the reader. A third and final endpoint deletes the in progress job file. There is a configurable timeout, after which they end up at a dead letter box. I am simplifying only a little bit. It's a couple hundred lines of Go.
The way this is set up means a message will only ever be handed to one worker. That simplifies things a lot. The workers ask for work when they want it, rather than being constantly listening.
It took a little tuning but we process a couple billion events a day this way and it's been basically zero maintenance for almost 10 years. The wizards in devops even figured out a way to autoscale it.
Use the right tool for the job.
I'm certain you can work around those issues, but why if you can use a proper queue?
If you need to handle 70 requests/second, design for 700. If you need to handle 20 servers running batch jobs, design for 200 servers. If you're in a startup that grows 100% per year, you'll be at 8x scale in 3 years. So you'll have time to rewrite as you grow!
Out of three job systems I built, the first one tried to avoid SQL for "scalability." Then we hit a bunch of edge cases that needed transactional integrity, and we were in a world of pain.
My two more recent distributed job systems use PostgreSQL as a coordinator. They're literally built around SELECT FOR UPDATE SKIP LOCKED. One of them routinely controls 350 workers and the other does elaborate prioritization for thousands of jobs. Both of them will continue to run just fine until they're earning millions of dollars a year—for example, the one controlling 350 workers should scale to about 2,000 CPUs with a little work.
Hyperscale technologies are fairly cheap right up until you discover you need transactions. At that point, faking transactional semantics on top of an eventually consistent data store becomes an engineering nightmare.
So sit down, and do the math. If your company was earning $100 million/year, how big would your distributed system need to be? Can you easily get a PostgreSQL instance big enough to handle that load? (Or could you just shard per client?) If so, strongly consider using PostgreSQL. It makes a hundred things trivial.