Distribute the rows of a SQL table between several multi-instantiated applications
SQL databases nowadays offer many tools to allow us to manipulate our data in the best way.
This is especially the case when we want to lock rows in our tables. Moreover, it is much better when the database engine manages this for you.
The mechanics that I will present in this article are available on the MySQL, MariaDB and PostgreSQL engines.
For the purpose of this article, we will imagine a multi-instantiated application where each application will regularly poll jobs in a SQL table and update their status.
Let's take the following SQL table named
The problem is that when application A is updating the status of job 1, this same job 1 must not be updated by an application B in the meantime.
To solve this problem, I used a SQL mechanism that allows to lock rows in the database as long as a transaction has not been modified yet thanks to the syntax
SELECT ... FOR ....
To solve this problem, we will also need transactions, so a little reminder about transactions if you don't know yet.
Transactions allow to avoid writing and reading errors linked to concurrent requests.
Indeed, queries are called "atomic", which means that the state in your database will be modified only when you send a
The mechanism is very simple to use, you just have to specify
START TRANSACTION (or
BEGIN) and to follow your SQL queries, as follows:
BEGIN; UPDATE jobs SET status = 'pending' WHERE id = 1; UPDATE jobs SET status = 'error' WHERE id = 2; -- Add the queries you want ... COMMIT;
Using SELECT ... FOR ...
SELECT ... FOR ... is available in at least the following three databases:
- MySQL: https://dev.mysql.com/doc/refman/5.7/en/select.html
- MariaDB: https://mariadb.com/kb/en/select/
- PostgreSQL: https://www.postgresql.org/docs/9.0/sql-select.html
The syntax brings the following keywords:
FOR UPDATE: Allows to lock the lines on read, update and delete,
FOR SHARE: Locks the update and delete only, but the value can still be read.
When these options are specified, the next SQL query that tries to interact with one of these rows will have to wait for the transaction to complete. Beware of deadlocks.
However, options can be added after this syntax, including:
NOWAIT: Allows you not to wait for the transaction that locked the line to finish and to return an error immediately,
SKIP LOCKED: Allows you to not wait for the transaction that locked the row to finish but not return the locked rows in the result.
In our case, it is this last keyword that will interest us for the rest of this article:
Line distribution management
Thus, coupled with transactions, the mechanism to implement is the following:
Each application having its own SQL session, the first application will perform the following queries:
START TRANSACTION; -- Selection of lines (not locked by another application) to be processed SELECT * FROM jobs WHERE status = 'created' ORDER BY created_at ASC LIMIT 2 FOR UPDATE SKIP LOCKED; -- Later, application 1 will update jobs UPDATE jobs SET status = 'pending' WHERE id = 1; UPDATE jobs SET status = 'error' WHERE id = 2; COMMIT;
So, in this case, application 2 will never process the jobs with the ID
2 until application 1 has finished processing them (COMMIT).
Our problem is solved and we can now scalal our applications with peace of mind, they won't step on each other's toes when they process the data.
Moreover, there is no need to implement complicated logic in our application, the database engine manages this much better than we do!