Switch to dark mode

Distribute the rows of a SQL table between several multi-instantiated applications

Distribute the rows of a SQL table between several multi-instantiated applications

SQL databases nowadays offer many tools to allow us to manipulate our data in the best way.

This is especially the case when we want to lock rows in our tables. Moreover, it is much better when the database engine manages this for you.

The mechanics that I will present in this article are available on the MySQL, MariaDB and PostgreSQL engines.

Use cases

For the purpose of this article, we will imagine a multi-instantiated application where each application will regularly poll jobs in a SQL table and update their status.

Let's take the following SQL table named jobs:

id status created_at
1 created 2022-07-13T18:56:00Z
2 created 2022-07-13T18:57:00Z
3 created 2022-07-13T18:59:00Z
4 created 2022-07-13T19:00:00Z
.. .. ..

The problem is that when application A is updating the status of job 1, this same job 1 must not be updated by an application B in the meantime.

Example use case

To solve this problem, I used a SQL mechanism that allows to lock rows in the database as long as a transaction has not been modified yet thanks to the syntax SELECT ... FOR ....

Using transactions

To solve this problem, we will also need transactions, so a little reminder about transactions if you don't know yet.

Transactions allow to avoid writing and reading errors linked to concurrent requests.

Indeed, queries are called "atomic", which means that the state in your database will be modified only when you send a COMMIT (or ROLLBACK) query.

The mechanism is very simple to use, you just have to specify START TRANSACTION (or BEGIN) and to follow your SQL queries, as follows:

BEGIN;

UPDATE jobs SET status = 'pending' WHERE id = 1;
UPDATE jobs SET status = 'error' WHERE id = 2;
-- Add the queries you want ...

COMMIT;

Using SELECT ... FOR ...

The SELECT ... FOR ... is available in at least the following three databases:

The syntax brings the following keywords:

  • FOR UPDATE: Allows to lock the lines on read, update and delete,
  • FOR SHARE: Locks the update and delete only, but the value can still be read.

When these options are specified, the next SQL query that tries to interact with one of these rows will have to wait for the transaction to complete. Beware of deadlocks.

However, options can be added after this syntax, including:

  • NOWAIT: Allows you not to wait for the transaction that locked the line to finish and to return an error immediately,
  • SKIP LOCKED: Allows you to not wait for the transaction that locked the row to finish but not return the locked rows in the result.

In our case, it is this last keyword that will interest us for the rest of this article: SKIP LOCKED.

Line distribution management

Thus, coupled with transactions, the mechanism to implement is the following:

Distribute rows lock mechanism

Each application having its own SQL session, the first application will perform the following queries:

START TRANSACTION;

-- Selection of lines (not locked by another application) to be processed
SELECT * FROM jobs WHERE status = 'created' ORDER BY created_at ASC LIMIT 2 FOR UPDATE SKIP LOCKED;

-- Later, application 1 will update jobs
UPDATE jobs SET status = 'pending' WHERE id = 1;
UPDATE jobs SET status = 'error' WHERE id = 2;

COMMIT;

So, in this case, application 2 will never process the jobs with the ID 1 and 2 until application 1 has finished processing them (COMMIT).

Conclusion

Our problem is solved and we can now scalal our applications with peace of mind, they won't step on each other's toes when they process the data.

Moreover, there is no need to implement complicated logic in our application, the database engine manages this much better than we do!