Friday 19 February 2016

PostgreSQL 9.6 - Part 1 - Horizontal Scalability

PostgreSQL 9.6 is shaping up to be an impressive release, so I thought I'd try to summarise some of the big features and improvements that it will bring. This certainly won't be exhaustive, but it will touch upon areas I've looked at closely.

I'm looking at breaking this down into the following areas:
  • Part 1 - Horizontal Scalability
  • Part 2 - Monitoring
  • Part 3 - Parallelism
  • Part 4 - Vertical Scalability
So without further ado, here's Part 1 - Horizontal Scalability

I guess I should start with the question some might be asking: "What is 'Horizontal Scalability'?" This is just a fancy way of describing the ability to distribute workload among several clusters/servers/locations etc. So rather than putting all the burden on one server, you can delegate at least some of the work to other servers to speed things up.

For 9.6, I'll specifically be covering foreign tables (i.e. tables on remote servers which are being queried as if they were local tables). The changes are currently only usable for postgres_fdw (the core extension that allows connections to tables in other PostgreSQL databases), but other foreign data wrappers will be able to make similar changes in future. All the following features were the result of lots of hard work by my colleague at EnterpriseDB, Ashutosh Bapat, with help from Robert Haas.
Join pushdown
At the moment (pre-9.6), any queries you run which involve foreign tables require that the data from those tables be fetched in their entirety prior to any sorting or joining, and then anything you want to do to them (join, sort etc.) have to be done locally on the returned data set. As you can imagine, this is probably not what you want to do, particularly if you had a query like:
SELECT *
FROM remote.contacts c
INNER JOIN remote.countries n ON c.country = n.country
WHERE c.continent = 'Australasia';
(Note: In these examples, the tables in the schema named "remote" are foreign tables (tables which reside on a remote server), so here, remote.contacts and remote.countries are both foreign tables. It just helps us distinguish between foreign tables and local ones.)

This would fetch the whole of the table named "countries", and then, once the remote server has read all the rows of the table from disk, and pushed it across the network, the local server will then need fetch whichever rows of the foreign table named "contacts" match the WHERE clause that targets it, and then join both of the results locally.

To illustrate this behaviour, we'd probably end up running these 2 queries on the remote server:
SELECT country, continent
FROM countries
WHERE continent = 'Australasia';
SELECT id, first_name, last_name, age, country
FROM contacts;
(Note: The "remote." prefix isn't shown here, because these are examples of the queries we'd actually be running on the remote server.)

Now, we did get *some* pushdown here, namely the bit saying "WHERE continent = 'Australasia'". That's called a qual pushdown, and it means we didn't need to read in all of the remote "countries" table and then filter locally. Not a big deal in this case as we know we're dealing with a small table, but you can imagine how useful that is if you're fetching from a huge table that's indexed on the remote server. Without qual pushdown, the local server would have to fetch entire tables from the remote server when it only wants a single row.

But that only gets us so far. We still don't want to fetch both of these data sets separately. Wouldn't it be great if we could just tell the remote server to do the join for us? It could then take advantage of any indexes it has available to optimise the join, and we wouldn't need to transfer so much data between the remote server and the local server.

This is where PostgreSQL 9.6 can use its new join pushdown powers to do just that. In the above example, we would end up sending pretty much the same query we're running on the local server to the remote server, telling it to perform the join itself, and give us the results. This means less disk reads on the remote side, less network utilisation between remote and local server, less work for the local server (as the remote side has already done it all, and faster than the local one could have done it), and ultimately, a much shorter query time. The requirement for this to happen is that the foreign tables involved in the join have to reside on the same remote server and have the same user mapping in order to be considered for join pushdown.

But, I hear you ask, what if we're joining a mixture of local tables and foreign tables? What then? Surely PostgreSQL will break down in tears not knowing what to do? As it happens, PostgreSQL has a diploma in working out how to construct joins just between the foreign tables, and sending them off to the remote server separately. (the usual terms and conditions apply, being that they need to reside on the same remote server and use the same user mappings).

So imagine we have a local table called "contacts_notes", and this contains an "id" column, a "contactid" column (which should match remote.contacts.id) and a "notes" column, where we'd keep notes about the customer. We want to join this to the "contactid" column on the remote.contacts table, and have the following query we will run locally:
SELECT *
FROM remote.contacts c
INNER JOIN remote.countries n ON c.country = n.country
INNER JOIN contacts_notes o ON c.id = o.contactid
WHERE continent = 'Australasia';
Prior to 9.6, it would do as mentioned before, fetch the filtered result from countries, then the whole of contacts, join them locally, and then join that result with the local contacts_notes table. 9.6 will now be able to look at the query and say "Okay, so I can see that 2 of these tables being joined are both on the same remote server, so I'll rewrite the join to only join those 2 tables, send that off to the remote server, and join the result with the local table when it comes back."

So here's an example of the query plan for pre-9.6:
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=49.30..35665.45 rows=16 width=186) (actual time=12533.153..13449.441 rows=187226 loops=1)
   Output: c.id, c.first_name, c.last_name, c.age, c.country, n.country, n.continent, o.id, o.contactid, o.note
   Hash Cond: (o.contactid = c.id)
   ->  Seq Scan on public.contacts_notes o  (cost=0.00..27282.66 rows=1666666 width=18) (actual time=0.015..218.781 rows=1666666 loops=1)
         Output: o.id, o.contactid, o.note
   ->  Hash  (cost=49.10..49.10 rows=16 width=168) (actual time=12533.068..12533.068 rows=563143 loops=1)
         Output: c.id, c.first_name, c.last_name, c.age, c.country, n.country, n.continent
         Buckets: 65536 (originally 1024)  Batches: 16 (originally 1)  Memory Usage: 3675kB
         ->  Hash Join  (cost=23.74..49.10 rows=16 width=168) (actual time=1.191..12235.555 rows=563143 loops=1)
               Output: c.id, c.first_name, c.last_name, c.age, c.country, n.country, n.continent
               Hash Cond: (c.country = n.country)
               ->  Foreign Scan on remote.contacts c  (cost=1.00..23.80 rows=640 width=104) (actual time=0.702..11184.456 rows=5000000 loops=1)
                     Output: c.id, c.first_name, c.last_name, c.age, c.country
                     Remote SQL: SELECT id, first_name, last_name, age, country FROM public.contacts
               ->  Hash  (cost=22.68..22.68 rows=5 width=64) (actual time=0.472..0.472 rows=28 loops=1)
                     Output: n.country, n.continent
                     Buckets: 1024  Batches: 1  Memory Usage: 10kB
                     ->  Foreign Scan on remote.countries n  (cost=1.00..22.68 rows=5 width=64) (actual time=0.454..0.455 rows=28 loops=1)
                           Output: n.country, n.continent
                           Remote SQL: SELECT country, continent FROM public.countries WHERE ((continent = 'Australasia'::text))
 Planning time: 0.393 ms
 Execution time: 13458.285 ms
(22 rows)
And now with 9.6:
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=154.38..35770.54 rows=16 width=186) (actual time=3095.418..4016.063 rows=187226 loops=1)
   Output: c.id, c.first_name, c.last_name, c.age, c.country, n.country, n.continent, o.id, o.contactid, o.note
   Hash Cond: (o.contactid = c.id)
   ->  Seq Scan on public.contacts_notes o  (cost=0.00..27282.66 rows=1666666 width=18) (actual time=0.023..221.201 rows=1666666 loops=1)
         Output: o.id, o.contactid, o.note
   ->  Hash  (cost=154.19..154.19 rows=16 width=168) (actual time=3095.318..3095.318 rows=563143 loops=1)
         Output: c.id, c.first_name, c.last_name, c.age, c.country, n.country, n.continent
         Buckets: 65536 (originally 1024)  Batches: 16 (originally 1)  Memory Usage: 3675kB
         ->  Foreign Scan  (cost=100.00..154.19 rows=16 width=168) (actual time=1.962..2932.641 rows=563143 loops=1)
               Output: c.id, c.first_name, c.last_name, c.age, c.country, n.country, n.continent
               Relations: (remote.contacts c) INNER JOIN (remote.countries n)
               Remote SQL: SELECT r1.id, r1.first_name, r1.last_name, r1.age, r1.country, r2.country, r2.continent FROM (public.contacts r1 INNER JOIN public.countries r2 ON (TRUE)) WHERE ((r1.country = r2.country)) AND ((r2.continent = 'Australasia'::text))
 Planning time: 0.400 ms
 Execution time: 4024.687 ms
(14 rows)
This isn't the whole story though, as it's not always possible to push a join down. Consider if we had written the query like:
SELECT *
FROM remote.contacts c
INNER JOIN contacts_notes o ON c.id = o.contactid
WHERE EXISTS (SELECT 1 FROM remote.countries n WHERE n.country = c.country AND n.continent = 'Australasia');
In order words, a semi-join. This can't currently be pushed down. Neither can an anti-join, which is the same thing, except we use NOT EXISTS instead. No doubt those will be pushed down in future, but not in 9.6.
Sort pushdown
Now this is all well and good, but what if we are instead sorting foreign table data rather than joining it. Prior to 9.6, we'd again have to fetch all the rows back from the remote server, then sort the data locally before being able to return the final data set.

PostgreSQL 9.6 will tell the remote server to do the sorting itself, which can be a huge win, because the remote server can have indexes on sorted columns. To demonstrate this, we'll use the same tables mentioned in our previous example (except I should also mention that we have an index on contacts.country).

Let's run the following query:
SELECT c.first_name, c.last_name, c.country, n.continent
FROM remote.contacts c
JOIN remote.countries n ON n.country = c.country
ORDER BY c.country;
Now let's look at the query plan with EXPLAIN (ANALYSE, VERBOSE), so that we can see what it did:
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=200.00..324.85 rows=3171 width=128) (actual time=14239.133..25005.195 rows=5000000 loops=1)
   Output: c.first_name, c.last_name, c.country, n.continent
   Merge Cond: (c.country = n.country)
   ->  Foreign Scan on remote.contacts c  (cost=100.00..133.82 rows=682 width=96) (actual time=14238.741..22807.627 rows=5000000 loops=1)
         Output: c.id, c.first_name, c.last_name, c.age, c.country
         Remote SQL: SELECT first_name, last_name, country FROM public.contacts ORDER BY country ASC
   ->  Materialize  (cost=100.00..144.09 rows=930 width=64) (actual time=0.386..331.147 rows=4979830 loops=1)
         Output: n.country, n.continent
         ->  Foreign Scan on remote.countries n  (cost=100.00..141.76 rows=930 width=64) (actual time=0.384..0.712 rows=249 loops=1)
               Output: n.country, n.continent
               Remote SQL: SELECT country, continent FROM public.countries ORDER BY country ASC
 Planning time: 0.319 ms
 Execution time: 25247.358 ms
(13 rows)
As you can see, the query plan shows that the ORDER BY was pushed down for both foreign scans.

We can also get an ORDER BY added to the remote queries when performing merge joins without explicitly specifying ORDER BY on our own query. PostgreSQL will add this if it deems it optimal. Note you can give PostgreSQL more information about the foreign tables by telling it to fetch planner information from the remote server prior to planning its own query. This can be done by adding the use_remote_estimate option to the server definition:
ALTER SERVER remote OPTIONS (ADD use_remote_estimate 'true')
Alternatively, you could set it on the individual foreign tables:
ALTER TABLE remote.contacts OPTIONS (ADD use_remote_estimate 'on');
ALTER TABLE remote.countries OPTIONS (ADD use_remote_estimate 'on');
This isn't required, and it can add a little overhead, but it can be a big benefit in cases where the query will be particularly complex.
"Walk and chew gum at the same time"
At the time of writing this, only joins will be pushed down, or only sorts. Robert Haas described this to me as "We can walk, and we can chew gum; we just can't do them at the same time at the moment."

But Ashutosh Bapat has come to the rescue, and submitted a patch to do just this. In demonstrating the effect of this change, we'll take the following example query we used previously:
SELECT c.first_name, c.last_name, c.country, n.continent
FROM remote.contacts c
JOIN remote.countries n ON n.country = c.country
ORDER BY c.country;
This could then generate the following plan:
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan  (cost=852221.63..1039720.04 rows=5000057 width=32) (actual time=11699.639..25104.462 rows=5000000 loops=1)
   Output: c.first_name, c.last_name, c.country, n.continent
   Relations: (remote.contacts c) INNER JOIN (remote.countries n)
   Remote SQL: SELECT r1.first_name, r1.last_name, r1.country, r2.continent FROM (public.contacts r1 INNER JOIN public.countries r2 ON (TRUE)) WHERE ((r1.country = r2.country)) ORDER BY r2.country ASC
 Planning time: 4.220 ms
 Execution time: 25326.628 ms
(6 rows)
As you can see, all the joining and sorting is done on the remote server side.

Join + sort pushdown can also be of use in cases where, say 2 foreign tables need to be joined to a local table, and you'd want the joined and sorted results back to optimise the join to the local table. So we should see that in 9.6 too. Here's a preview of what that would look like. Here's a query where we join 2 foreign tables, and we also want to join that with a local table:
SELECT c.country, n.continent
FROM remote.contacts c
JOIN remote.countries n ON n.country = c.country
JOIN contacts_notes o on c.id = o.contactid;
This gives us the following query plan:
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=836225.89..1057925.20 rows=5000063 width=64)
   Output: c.country, n.continent
   Merge Cond: (o.contactid = c.id)
   ->  Index Only Scan using contacts_notes_contactid_idx on public.contacts_notes o  (cost=0.43..30030.72 rows=1666666 width=4)
         Output: o.contactid
   ->  Materialize  (cost=836225.46..961227.03 rows=5000063 width=23)
         Output: c.country, c.id, n.continent
         ->  Foreign Scan  (cost=836225.46..948726.87 rows=5000063 width=23)
               Output: c.country, c.id, n.continent
               Relations: (remote.contacts c) INNER JOIN (remote.countries n)
               Remote SQL: SELECT r1.country, r1.id, r2.continent FROM (public.contacts r1 INNER JOIN public.countries r2 ON (TRUE)) WHERE ((r1.country = r2.country)) ORDER BY r1.id ASC
(11 rows)
You can see that the remote query has pushed down the join we explicitly requested, but also optimised the query further with a sort pushdown too.

There will be a lot more we'll have pushdown capabilities for in future, including, but not limited to, aggregate and limit pushdown.