Friday 18 March 2016

PostgreSQL 9.6 - Part 1.1 - Horizontal Scalability revisited

In my previous blog post about horizontal scalability changes in PostrgreSQL 9.6, I covered pushing sorts and joins to the remote server, but Robert Haas (EnterpriseDB) has just committed a change that allows DML (that's UPDATEs and DELETEs) to be pushed to the remote server too. This is thanks to the work of Etsuro Fujita (NTT) with advice and testing from Rushabh Lathia (EnterpriseDB), Shigeru Hanada, Robert Haas (EnterpriseDB), Kyotaro Horiguchi (NTT), Albe Laurenz and myself (EnterpriseDB). So this calls for an appendix to my original post!
DML pushdown
The problem previously was that an update or a delete would mean fetching a batch of rows from the remote server, and sending individual UPDATE commands to the remote server.

So if we ran this on the local server (where remote.big_table is a remote table which resides on the remote server):
UPDATE remote.big_table SET content = content || '.';
The remote server would receive the following request from the local server:
START TRANSACTION ISOLATION LEVEL REPEATABLE READ;
DECLARE c1 CURSOR FOR
  SELECT id, content, ctid FROM public.big_table FOR UPDATE;
FETCH 100 FROM c1;
UPDATE public.big_table SET content = $2 WHERE ctid = $1
So it starts a transaction, opens a cursor to iterate over the whole table, then fetches a batch of 100 rows. For each row the local server gets, it then issues a prepared UPDATE statement and executes it for every row. So if you have 1 million rows, there will be 1 million UPDATE statements.

This is pretty inefficient, as not only does the local server have to keep fetching batches, and issuing individual UPDATE statements for each row, but the remote server has to keep sending off batches, and execute every one of the UPDATE statements it receives. There's also the potential effect on network traffic this would have.

This will no longer be the case in 9.6. Instead, the local server will ask the remote server to do the whole UPDATE on its side:
START TRANSACTION ISOLATION LEVEL REPEATABLE READ;
UPDATE public.big_table SET content = (content || '.'::text);
Very straightforward activity here. The local server told the remote server to handle the whole update itself, and the remote server just does it. Obviously, there will be cases where individual UPDATEs need to be sent if joining the remote table to a local one, but this is an enormous win over the old behaviour in simple cases like this. Note that, at the moment, if an update targets a remote table and also joins to another remote table, it will push the join down, but only to fetch batches of rows to then issue individual UPDATES for. So those cases will still be at similar to the old behaviour.

Also, the examples I have given are just of UPDATE statements, but this also applies to DELETE too. So again, rather than fetching a batch of tuples from the remote server, and issuing individual DELETE statements for each one, it will send the whole statement across.

There is a downside to this new commit, which I haven't yet checked if there's a plan to rectify it. Previously, if you ran an UPDATE or DELETE statement, then cancelled it on the local side, the remote side would cancel and rollback, as instead of continuing to send individual UPDATE or DELETE statements, it would issue an abort. But with the current changes, the abort doesn't reach the remote server, or at least not in time.

No comments: