Thursday 25 July 2019

jsquery vs SQL/JSON

SQL/JSON is coming to PostgreSQL 12 and provides a native way to query JSON data (although, to be specific, JSON data stored as a JSONB data type).  This also introduces the jsonpath data type which is used for SQL/JSON query expressions.  I'll not be going into its usage in detail, or covering performance characteristics (at least not in this post), but I will compare the syntax and functionality with jsquery.  Note that this may potentially change prior to final release.

jsquery was introduced as a non-core extension for PostgreSQL 9.4 and higher by Teodor Sigaev, Alexander Korotkov and Oleg Bartunov.  Like jsonpath, it also used its own datatype, jsquery.

We have some functions to which we can pass jsonpath expressions to, 2 of which have operators which can be used as shorthand (albeit without additional parameter control, as they exist primarily for indexing purposes):
Function Operator Description
jsonb_path_exists @? This will return true if it matched something, false if not, or null if it resulted in an operation on a missing value.
jsonb_path_match @@ This does the same thing as jsonb_path_exists, but only first result item is tested.
jsonb_path_query None Returns the JSON data resulting from the jsonpath expression.
jsonb_path_query_array None Same as jsonb_path_query, but puts the result in a JSON array.
jsonb_path_query_first None Same as jsonb_path_query, but only selects the first value.

These will suppress errors where there's a lack of an array element, object field, an unexpected JSON type or numeric errors.

Here are some examples of how jsquery and SQL/JSON differ in practice. (Note that jsquery usage requires installation of the jsquery extension):

We will select all elements from an array which are equal to 1.  Note that, here, jsquery returns an array containing 1, whereas jsonpath returns a scalar value of 1.

jsquery
SELECT '[1,2,3]'::jsonb ~~ '#. ?($ = 1).$'::jsquery;

jsonpath
SELECT jsonb_path_query('[1,2,3]'::jsonb, '$[*] ? (@ == 1)');


Now we'll check that all elements are greater than 1 and less than 5.

jsquery
SELECT '[2,3,4]' @@ '#: ($ > 1 and $ < 5)'::jsquery;

jsonpath
SELECT '[2,3,4]' @? '$[*] ? (@ > 1 && @ < 5)';


And here we have some jsonb data as follows:
CREATE TABLE books (data jsonb);

INSERT INTO books (data) VALUES ('[{"author": "Charles Dickens", "book": "A Tale of Two Cities"},
   {"author": "William Shakespeare", "book": "Hamlet"}]');

We want books by William Shakespeare.

jsquery
SELECT data ~~ '#. ? ($.author = "William Shakespeare")' FROM books;

jsonpath
SELECT jsonb_path_query(data,'$[*] ? (@.author == "William Shakespeare")') FROM books;


You can see that, while they share some characteristics, they are not the same.  It's also possible to control whether it uses lax or strict rules which determine whether or not to throw an error if referencing a non-existing object member or a structural issue.  "lax" suppresses such errors, "strict" doesn't.  These are placed at the beginning of the jsonpath expression.

So we end up with the following syntax for jsonpath:

[lax|strict] <path expression> ? <filter expression>

I've put together some comparisons between jsquery and jsonpath expressions.

Key
- Achievable using other jsonpath operators.
* No jsonpath equivalent, but usage available at the SQL level.
x No equivalent.

Comparison of variables
jsquery jsonpath Description
$ $ The whole document
. . Accessor
* * All values at the current level
x ** All values at all levels
#N $[N] Nth value of an array starting at 0
x $[start,end] Slice of an array
# - All array elements
% - All object keys


Comparison of methods
jsquery jsonpath Description
x + (unary) Plus operation on a sequence
x - (unary) Minus operation on sequence
x + (binary) Addition
x - (binary) Subtraction
x * Multiplication
x / Division
IS <type> type() Checks the type (jsquery) or returns the type name (jsonpath)
@# size() Size (length) of an array
x double() Numeric value from string
x ceiling() Nearest integer greater than or equal to value
x floor() Nearest integer less than or equal to value
x abs() Absolute value of number
x keyvalue() Object represented as sequence of key, value and id fields


Comparison of operators
jsquery jsonpath Description
= == Equality
< < Less than
<= <= Less than or equal to
> > Greater than
>= >= Greater than or equal to
@> * Contains
<@ * Contained by
IN - Search within a list of scalar values
&& * Overlap
AND && Boolean AND
OR || Boolean OR
NOT ! Boolean NOT
=* exists Expression contains 1 or more items
starts with Value begins with specified value
x like_regex Test string against regex pattern


Special values
jsquery jsonpath
truetrue
falsefalse
nullnull
xis unknown

Note that, unlike SQL expressions, you can use the equality operator with "null", whereas you would usually have to state IS NULL.  This is because it's not directly equivalent.

jsquery features not present in SQL/JSON.

jsquery supports index hints, but this was necessary as the optimiser has no knowledge of the contents of jsquery strings or statistics related to individual values.  So this feature is effectively redundant when it comes to SQL/JSON.

Limitations

Only text, numeric and boolean types are supported at present.  Datetime is still a work in progress, so these are intended to be supported in JSONB in future.

For more information on the jsonpath and SQL/JSON, see the PostgreSQL documentation:

SQL/JSON Path Expressions
jsonpath Type

Friday 18 March 2016

PostgreSQL 9.6 - Part 1.1 - Horizontal Scalability revisited

In my previous blog post about horizontal scalability changes in PostrgreSQL 9.6, I covered pushing sorts and joins to the remote server, but Robert Haas (EnterpriseDB) has just committed a change that allows DML (that's UPDATEs and DELETEs) to be pushed to the remote server too. This is thanks to the work of Etsuro Fujita (NTT) with advice and testing from Rushabh Lathia (EnterpriseDB), Shigeru Hanada, Robert Haas (EnterpriseDB), Kyotaro Horiguchi (NTT), Albe Laurenz and myself (EnterpriseDB). So this calls for an appendix to my original post!
DML pushdown
The problem previously was that an update or a delete would mean fetching a batch of rows from the remote server, and sending individual UPDATE commands to the remote server.

So if we ran this on the local server (where remote.big_table is a remote table which resides on the remote server):
UPDATE remote.big_table SET content = content || '.';
The remote server would receive the following request from the local server:
START TRANSACTION ISOLATION LEVEL REPEATABLE READ;
DECLARE c1 CURSOR FOR
  SELECT id, content, ctid FROM public.big_table FOR UPDATE;
FETCH 100 FROM c1;
UPDATE public.big_table SET content = $2 WHERE ctid = $1
So it starts a transaction, opens a cursor to iterate over the whole table, then fetches a batch of 100 rows. For each row the local server gets, it then issues a prepared UPDATE statement and executes it for every row. So if you have 1 million rows, there will be 1 million UPDATE statements.

This is pretty inefficient, as not only does the local server have to keep fetching batches, and issuing individual UPDATE statements for each row, but the remote server has to keep sending off batches, and execute every one of the UPDATE statements it receives. There's also the potential effect on network traffic this would have.

This will no longer be the case in 9.6. Instead, the local server will ask the remote server to do the whole UPDATE on its side:
START TRANSACTION ISOLATION LEVEL REPEATABLE READ;
UPDATE public.big_table SET content = (content || '.'::text);
Very straightforward activity here. The local server told the remote server to handle the whole update itself, and the remote server just does it. Obviously, there will be cases where individual UPDATEs need to be sent if joining the remote table to a local one, but this is an enormous win over the old behaviour in simple cases like this. Note that, at the moment, if an update targets a remote table and also joins to another remote table, it will push the join down, but only to fetch batches of rows to then issue individual UPDATES for. So those cases will still be at similar to the old behaviour.

Also, the examples I have given are just of UPDATE statements, but this also applies to DELETE too. So again, rather than fetching a batch of tuples from the remote server, and issuing individual DELETE statements for each one, it will send the whole statement across.

There is a downside to this new commit, which I haven't yet checked if there's a plan to rectify it. Previously, if you ran an UPDATE or DELETE statement, then cancelled it on the local side, the remote side would cancel and rollback, as instead of continuing to send individual UPDATE or DELETE statements, it would issue an abort. But with the current changes, the abort doesn't reach the remote server, or at least not in time.

Thursday 17 March 2016

PostgreSQL 9.6 - Part 2 - Monitoring

This is part 2 in a 4 part blog series on improvements coming in PostgreSQL 9.6.
  • Part 1 - Horizontal Scalability
    • Join Pushdown
    • Sort Pushdown
    • "Walk and chew gum at the same time"
  • Part 2 - Monitoring
  • Part 3 - Parallelism
  • Part 4 - Vertical Scalability
VACUUM progress monitoring
When you run VACUUM on a large table, do you know how much it's done after 5 minutes? Do you know how much more there is to go? The answer is, no. You can't know, at least not accurately or easily. In PostgreSQL 9.6, you'll be able to monitor just this (although not VACUUM FULL), thanks to the work of Amit Langote (NTT) and Robert Haas (EnterpriseDB), which was built upon the work done by Rahila Syed (EnterpriseDB) and Vinayak Pokale (NTT). We now have a system view called pg_stat_progress_vacuum. This is the first in, hopefully, several progress monitoring views to come in future. Here's an example of what it shows you (shown in extended output format so that it fits on this blog page):
-[ RECORD 1 ]------+--------------
pid                | 13612
datid              | 16384
datname            | test
relid              | 16385
phase              | scanning heap
heap_blks_total    | 335141
heap_blks_scanned  | 186055
heap_blks_vacuumed | 0
index_vacuum_count | 0
max_dead_tuples    | 44739242
num_dead_tuples    | 3849285
The PostgreSQL docs explain what each of these columns contain: http://www.postgresql.org/docs/devel/static/progress-reporting.html#PG-STAT-PROGRESS-VACUUM

In the example above, we can see we have 335141 heap blocks in the table being scanned. Each block is 8kB in size, so this represents a 2618MB table. Next, we see that 186055 blocks have been scanned, so 1454MB. That means we have approximately 1164MB left to scan, and we're 55.5% into the scan phase of the VACUUM. You can see that we haven't actually VACUUMed any of the table yet (heap_blks_vacuumed), and the current phase reports "scanning heap". There are 7 phases for VACUUMs that will be visible in this view:
  • initializing
  • scanning heap
  • vacuuming indexes
  • vacuuming heap
  • cleaning up indexes
  • truncating heap
  • performing final cleanup

These are all described in more detail in the documentation: http://www.postgresql.org/docs/devel/static/progress-reporting.html#VACUUM-PHASES

What we have here is a very useful tool in monitoring VACUUM progress, which was previously not possible. DBAs used to have to wait until a VACUUM finished. Note that this new feature doesn't give us a time estimate of how much VACUUM time remains. This is because a VACUUM still may need to wait, or it may skip pages that it knows don't need VACUUMing, so a time estimate would be unreliable. You could craft a query that links pg_stat_progress_vacuum with pg_stat_activity to work out how long it's taken so far, and based on how many blocks it got through in that time, how long it would take to complete given the remaining blocks, but that would still only apply to an individual phase, and wouldn't give you a trustworthy ETA.

I did say that this was the first progress monitoring view, and that more may come in future. These may include progress on REINDEX, ANALYZE, VACUUM FULL/CLUSTER and ALTER TABLE (where it results in a table rewrite).
Wait monitoring
DBAs have relied on pg_stat_activity to provide information about what's currently happening on the database. This returns information about each connection in the cluster, including the queries being run, which users are connected to which database, when the connection was made, whether it's active or idle, and various other bits of information.

But now it's been updated to include 2 new columns: wait_event_type and wait_event. This is thanks to the work of Amit Kapila (EnterpriseDB) and Ildus Kurbangaliev (PostgresPro). These provide introspection on a level not previously possible, and return information about what that connection is currently waiting on (if anything). This can be a huge help in debugging performance issues, or working out what's holding a query up. For example, if there's a hold-up waiting to write to the Write-Ahead Log, we'll see wait_event_type set to 'LWLockNamed' and wait_event set to 'WALWriteLock'.

Here's a real example:
-[ RECORD 81 ]---+---------------------------------------------------------------------------------------------------------------
datid            | 16422
datname          | pgbench
pid              | 5972
usesysid         | 10
usename          | thom
application_name | pgbench
client_addr      | 
client_hostname  | 
client_port      | -1
backend_start    | 2016-03-15 19:37:47.544535+00
xact_start       | 2016-03-15 19:38:06.277451+00
query_start      | 2016-03-15 19:38:06.281068+00
state_change     | 2016-03-15 19:38:06.28107+00
wait_event_type  | Lock
wait_event       | tuple
state            | active
backend_xid      | 176332
backend_xmin     | 175895
query            | UPDATE pgbench_branches SET bbalance = bbalance + 4430 WHERE bid = 77;
As you can see, here we have wait_event_type showing 'Lock' and wait_event showing 'tuple'. So this is waiting on a tuple lock at this moment in time, which means the query won't progress until it no longer needs to wait.

Here another example:
-[ RECORD 47 ]---+------------------------------
datid            | 16384
datname          | pgbench
pid              | 6052
usesysid         | 10
usename          | thom
application_name | pgbench
client_addr      | 
client_hostname  | 
client_port      | -1
backend_start    | 2016-03-17 14:43:37.424887+00
xact_start       | 2016-03-17 14:43:37.426224+00
query_start      | 2016-03-17 14:43:37.428431+00
state_change     | 2016-03-17 14:43:37.428432+00
wait_event_type  | LWLockNamed
wait_event       | WALWriteLock
state            | active
backend_xid      | 8460
backend_xmin     | 
query            | END;
So, given the information shown, we can determine that it's waiting on a Write-Ahead Log write lock. A full list of the wait event types and wait events is available in the documentation: http://www.postgresql.org/docs/devel/static/monitoring-stats.html#WAIT-EVENT-TABLE

This should be a huge help for cases where queries or commands get stuck waiting for something, where previously we wouldn't have known what it was waiting for.

Friday 19 February 2016

PostgreSQL 9.6 - Part 1 - Horizontal Scalability

PostgreSQL 9.6 is shaping up to be an impressive release, so I thought I'd try to summarise some of the big features and improvements that it will bring. This certainly won't be exhaustive, but it will touch upon areas I've looked at closely.

I'm looking at breaking this down into the following areas:
  • Part 1 - Horizontal Scalability
  • Part 2 - Monitoring
  • Part 3 - Parallelism
  • Part 4 - Vertical Scalability
So without further ado, here's Part 1 - Horizontal Scalability

I guess I should start with the question some might be asking: "What is 'Horizontal Scalability'?" This is just a fancy way of describing the ability to distribute workload among several clusters/servers/locations etc. So rather than putting all the burden on one server, you can delegate at least some of the work to other servers to speed things up.

For 9.6, I'll specifically be covering foreign tables (i.e. tables on remote servers which are being queried as if they were local tables). The changes are currently only usable for postgres_fdw (the core extension that allows connections to tables in other PostgreSQL databases), but other foreign data wrappers will be able to make similar changes in future. All the following features were the result of lots of hard work by my colleague at EnterpriseDB, Ashutosh Bapat, with help from Robert Haas.
Join pushdown
At the moment (pre-9.6), any queries you run which involve foreign tables require that the data from those tables be fetched in their entirety prior to any sorting or joining, and then anything you want to do to them (join, sort etc.) have to be done locally on the returned data set. As you can imagine, this is probably not what you want to do, particularly if you had a query like:
SELECT *
FROM remote.contacts c
INNER JOIN remote.countries n ON c.country = n.country
WHERE c.continent = 'Australasia';
(Note: In these examples, the tables in the schema named "remote" are foreign tables (tables which reside on a remote server), so here, remote.contacts and remote.countries are both foreign tables. It just helps us distinguish between foreign tables and local ones.)

This would fetch the whole of the table named "countries", and then, once the remote server has read all the rows of the table from disk, and pushed it across the network, the local server will then need fetch whichever rows of the foreign table named "contacts" match the WHERE clause that targets it, and then join both of the results locally.

To illustrate this behaviour, we'd probably end up running these 2 queries on the remote server:
SELECT country, continent
FROM countries
WHERE continent = 'Australasia';
SELECT id, first_name, last_name, age, country
FROM contacts;
(Note: The "remote." prefix isn't shown here, because these are examples of the queries we'd actually be running on the remote server.)

Now, we did get *some* pushdown here, namely the bit saying "WHERE continent = 'Australasia'". That's called a qual pushdown, and it means we didn't need to read in all of the remote "countries" table and then filter locally. Not a big deal in this case as we know we're dealing with a small table, but you can imagine how useful that is if you're fetching from a huge table that's indexed on the remote server. Without qual pushdown, the local server would have to fetch entire tables from the remote server when it only wants a single row.

But that only gets us so far. We still don't want to fetch both of these data sets separately. Wouldn't it be great if we could just tell the remote server to do the join for us? It could then take advantage of any indexes it has available to optimise the join, and we wouldn't need to transfer so much data between the remote server and the local server.

This is where PostgreSQL 9.6 can use its new join pushdown powers to do just that. In the above example, we would end up sending pretty much the same query we're running on the local server to the remote server, telling it to perform the join itself, and give us the results. This means less disk reads on the remote side, less network utilisation between remote and local server, less work for the local server (as the remote side has already done it all, and faster than the local one could have done it), and ultimately, a much shorter query time. The requirement for this to happen is that the foreign tables involved in the join have to reside on the same remote server and have the same user mapping in order to be considered for join pushdown.

But, I hear you ask, what if we're joining a mixture of local tables and foreign tables? What then? Surely PostgreSQL will break down in tears not knowing what to do? As it happens, PostgreSQL has a diploma in working out how to construct joins just between the foreign tables, and sending them off to the remote server separately. (the usual terms and conditions apply, being that they need to reside on the same remote server and use the same user mappings).

So imagine we have a local table called "contacts_notes", and this contains an "id" column, a "contactid" column (which should match remote.contacts.id) and a "notes" column, where we'd keep notes about the customer. We want to join this to the "contactid" column on the remote.contacts table, and have the following query we will run locally:
SELECT *
FROM remote.contacts c
INNER JOIN remote.countries n ON c.country = n.country
INNER JOIN contacts_notes o ON c.id = o.contactid
WHERE continent = 'Australasia';
Prior to 9.6, it would do as mentioned before, fetch the filtered result from countries, then the whole of contacts, join them locally, and then join that result with the local contacts_notes table. 9.6 will now be able to look at the query and say "Okay, so I can see that 2 of these tables being joined are both on the same remote server, so I'll rewrite the join to only join those 2 tables, send that off to the remote server, and join the result with the local table when it comes back."

So here's an example of the query plan for pre-9.6:
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=49.30..35665.45 rows=16 width=186) (actual time=12533.153..13449.441 rows=187226 loops=1)
   Output: c.id, c.first_name, c.last_name, c.age, c.country, n.country, n.continent, o.id, o.contactid, o.note
   Hash Cond: (o.contactid = c.id)
   ->  Seq Scan on public.contacts_notes o  (cost=0.00..27282.66 rows=1666666 width=18) (actual time=0.015..218.781 rows=1666666 loops=1)
         Output: o.id, o.contactid, o.note
   ->  Hash  (cost=49.10..49.10 rows=16 width=168) (actual time=12533.068..12533.068 rows=563143 loops=1)
         Output: c.id, c.first_name, c.last_name, c.age, c.country, n.country, n.continent
         Buckets: 65536 (originally 1024)  Batches: 16 (originally 1)  Memory Usage: 3675kB
         ->  Hash Join  (cost=23.74..49.10 rows=16 width=168) (actual time=1.191..12235.555 rows=563143 loops=1)
               Output: c.id, c.first_name, c.last_name, c.age, c.country, n.country, n.continent
               Hash Cond: (c.country = n.country)
               ->  Foreign Scan on remote.contacts c  (cost=1.00..23.80 rows=640 width=104) (actual time=0.702..11184.456 rows=5000000 loops=1)
                     Output: c.id, c.first_name, c.last_name, c.age, c.country
                     Remote SQL: SELECT id, first_name, last_name, age, country FROM public.contacts
               ->  Hash  (cost=22.68..22.68 rows=5 width=64) (actual time=0.472..0.472 rows=28 loops=1)
                     Output: n.country, n.continent
                     Buckets: 1024  Batches: 1  Memory Usage: 10kB
                     ->  Foreign Scan on remote.countries n  (cost=1.00..22.68 rows=5 width=64) (actual time=0.454..0.455 rows=28 loops=1)
                           Output: n.country, n.continent
                           Remote SQL: SELECT country, continent FROM public.countries WHERE ((continent = 'Australasia'::text))
 Planning time: 0.393 ms
 Execution time: 13458.285 ms
(22 rows)
And now with 9.6:
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=154.38..35770.54 rows=16 width=186) (actual time=3095.418..4016.063 rows=187226 loops=1)
   Output: c.id, c.first_name, c.last_name, c.age, c.country, n.country, n.continent, o.id, o.contactid, o.note
   Hash Cond: (o.contactid = c.id)
   ->  Seq Scan on public.contacts_notes o  (cost=0.00..27282.66 rows=1666666 width=18) (actual time=0.023..221.201 rows=1666666 loops=1)
         Output: o.id, o.contactid, o.note
   ->  Hash  (cost=154.19..154.19 rows=16 width=168) (actual time=3095.318..3095.318 rows=563143 loops=1)
         Output: c.id, c.first_name, c.last_name, c.age, c.country, n.country, n.continent
         Buckets: 65536 (originally 1024)  Batches: 16 (originally 1)  Memory Usage: 3675kB
         ->  Foreign Scan  (cost=100.00..154.19 rows=16 width=168) (actual time=1.962..2932.641 rows=563143 loops=1)
               Output: c.id, c.first_name, c.last_name, c.age, c.country, n.country, n.continent
               Relations: (remote.contacts c) INNER JOIN (remote.countries n)
               Remote SQL: SELECT r1.id, r1.first_name, r1.last_name, r1.age, r1.country, r2.country, r2.continent FROM (public.contacts r1 INNER JOIN public.countries r2 ON (TRUE)) WHERE ((r1.country = r2.country)) AND ((r2.continent = 'Australasia'::text))
 Planning time: 0.400 ms
 Execution time: 4024.687 ms
(14 rows)
This isn't the whole story though, as it's not always possible to push a join down. Consider if we had written the query like:
SELECT *
FROM remote.contacts c
INNER JOIN contacts_notes o ON c.id = o.contactid
WHERE EXISTS (SELECT 1 FROM remote.countries n WHERE n.country = c.country AND n.continent = 'Australasia');
In order words, a semi-join. This can't currently be pushed down. Neither can an anti-join, which is the same thing, except we use NOT EXISTS instead. No doubt those will be pushed down in future, but not in 9.6.
Sort pushdown
Now this is all well and good, but what if we are instead sorting foreign table data rather than joining it. Prior to 9.6, we'd again have to fetch all the rows back from the remote server, then sort the data locally before being able to return the final data set.

PostgreSQL 9.6 will tell the remote server to do the sorting itself, which can be a huge win, because the remote server can have indexes on sorted columns. To demonstrate this, we'll use the same tables mentioned in our previous example (except I should also mention that we have an index on contacts.country).

Let's run the following query:
SELECT c.first_name, c.last_name, c.country, n.continent
FROM remote.contacts c
JOIN remote.countries n ON n.country = c.country
ORDER BY c.country;
Now let's look at the query plan with EXPLAIN (ANALYSE, VERBOSE), so that we can see what it did:
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=200.00..324.85 rows=3171 width=128) (actual time=14239.133..25005.195 rows=5000000 loops=1)
   Output: c.first_name, c.last_name, c.country, n.continent
   Merge Cond: (c.country = n.country)
   ->  Foreign Scan on remote.contacts c  (cost=100.00..133.82 rows=682 width=96) (actual time=14238.741..22807.627 rows=5000000 loops=1)
         Output: c.id, c.first_name, c.last_name, c.age, c.country
         Remote SQL: SELECT first_name, last_name, country FROM public.contacts ORDER BY country ASC
   ->  Materialize  (cost=100.00..144.09 rows=930 width=64) (actual time=0.386..331.147 rows=4979830 loops=1)
         Output: n.country, n.continent
         ->  Foreign Scan on remote.countries n  (cost=100.00..141.76 rows=930 width=64) (actual time=0.384..0.712 rows=249 loops=1)
               Output: n.country, n.continent
               Remote SQL: SELECT country, continent FROM public.countries ORDER BY country ASC
 Planning time: 0.319 ms
 Execution time: 25247.358 ms
(13 rows)
As you can see, the query plan shows that the ORDER BY was pushed down for both foreign scans.

We can also get an ORDER BY added to the remote queries when performing merge joins without explicitly specifying ORDER BY on our own query. PostgreSQL will add this if it deems it optimal. Note you can give PostgreSQL more information about the foreign tables by telling it to fetch planner information from the remote server prior to planning its own query. This can be done by adding the use_remote_estimate option to the server definition:
ALTER SERVER remote OPTIONS (ADD use_remote_estimate 'true')
Alternatively, you could set it on the individual foreign tables:
ALTER TABLE remote.contacts OPTIONS (ADD use_remote_estimate 'on');
ALTER TABLE remote.countries OPTIONS (ADD use_remote_estimate 'on');
This isn't required, and it can add a little overhead, but it can be a big benefit in cases where the query will be particularly complex.
"Walk and chew gum at the same time"
At the time of writing this, only joins will be pushed down, or only sorts. Robert Haas described this to me as "We can walk, and we can chew gum; we just can't do them at the same time at the moment."

But Ashutosh Bapat has come to the rescue, and submitted a patch to do just this. In demonstrating the effect of this change, we'll take the following example query we used previously:
SELECT c.first_name, c.last_name, c.country, n.continent
FROM remote.contacts c
JOIN remote.countries n ON n.country = c.country
ORDER BY c.country;
This could then generate the following plan:
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan  (cost=852221.63..1039720.04 rows=5000057 width=32) (actual time=11699.639..25104.462 rows=5000000 loops=1)
   Output: c.first_name, c.last_name, c.country, n.continent
   Relations: (remote.contacts c) INNER JOIN (remote.countries n)
   Remote SQL: SELECT r1.first_name, r1.last_name, r1.country, r2.continent FROM (public.contacts r1 INNER JOIN public.countries r2 ON (TRUE)) WHERE ((r1.country = r2.country)) ORDER BY r2.country ASC
 Planning time: 4.220 ms
 Execution time: 25326.628 ms
(6 rows)
As you can see, all the joining and sorting is done on the remote server side.

Join + sort pushdown can also be of use in cases where, say 2 foreign tables need to be joined to a local table, and you'd want the joined and sorted results back to optimise the join to the local table. So we should see that in 9.6 too. Here's a preview of what that would look like. Here's a query where we join 2 foreign tables, and we also want to join that with a local table:
SELECT c.country, n.continent
FROM remote.contacts c
JOIN remote.countries n ON n.country = c.country
JOIN contacts_notes o on c.id = o.contactid;
This gives us the following query plan:
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=836225.89..1057925.20 rows=5000063 width=64)
   Output: c.country, n.continent
   Merge Cond: (o.contactid = c.id)
   ->  Index Only Scan using contacts_notes_contactid_idx on public.contacts_notes o  (cost=0.43..30030.72 rows=1666666 width=4)
         Output: o.contactid
   ->  Materialize  (cost=836225.46..961227.03 rows=5000063 width=23)
         Output: c.country, c.id, n.continent
         ->  Foreign Scan  (cost=836225.46..948726.87 rows=5000063 width=23)
               Output: c.country, c.id, n.continent
               Relations: (remote.contacts c) INNER JOIN (remote.countries n)
               Remote SQL: SELECT r1.country, r1.id, r2.continent FROM (public.contacts r1 INNER JOIN public.countries r2 ON (TRUE)) WHERE ((r1.country = r2.country)) ORDER BY r1.id ASC
(11 rows)
You can see that the remote query has pushed down the join we explicitly requested, but also optimised the query further with a sort pushdown too.

There will be a lot more we'll have pushdown capabilities for in future, including, but not limited to, aggregate and limit pushdown.

Tuesday 9 December 2014

Sunday 2 March 2014

Students, we want you involved in Google Summer of Code

The PostgreSQL Project is yet again participating in Google Summer of Code for 2014. Google will be funding students who take part at $5,500 USD per student (see GSoC FAQ for more details). We would like to hear from students who would be willing to work on projects to add new or enhance existing features. You won't be going it alone, we'll assign experienced community developers to mentor you throughout your project.

Have a look at the TODO list on the wiki for inspiration, or the list of project ideas submitted so far by the community, although these are by no means what projects are limited to. Whether you've spotted something you think you could work on, or have a new idea to propose, submit it to the pgsql-students mailing list for discussion (see the mailing list page to subscribe) but do it soon as there's less than 3 weeks to register final proposals.

Also, if you know of any students who may be interested in participating, please send them our way.

More information can be found on the PostgreSQL GSoC page

Tuesday 4 February 2014

Why won't you take my money?

(I've only just realised that this has been sitting on my account as a draft entry for a long time, so I'm publishing it despite some details perhaps being slightly out of date)

Why do companies make it so hard to hand my money over to them? It's like they're running away from me and throwing obstacles in my way to stop me reaching them.

Perhaps I should elucidate a little. What I'm talking about is buying music and films. Ideally I would see something I like, and buy it. However, whilst I can technically do this, it's not practical in most cases. Here's an example:

LoveFilm like to make a big deal out of being able to stream films to your PC. "Great", I thought, "I have a PC, and I want to watch films, so I'll just plug it into my TV and give it a go." Just one problem; even though it says you can watch it on your PC, it took a bit more research to discover that what they actually meant was "A PC that can install and use Microsoft Silverlight." Now that's not what they say on their streaming page. They seem to omit that key detail. You have to dig around in the help section for the real requirements. Now I don't have Windows on any of my PCs. I either have Debian or Ubuntu, and Silverlight is not available on these platforms. You may be thinking "Ah, but Linux has the Moonlight project which is the Linux equivalent of Silverlight." That it is, but completely unusable in this case because Microsoft won't license their PlayReady DRM system to the Moonlight project, so that option's dead. Also my work laptop is a Macbook Pro, but this is for work, not for installing entertainment software on, so that's not an option. Basically DRM is the real reason I can't watch it. Because they don't want me to copy the film that I paid to watch, they insist I must have software installed that prevents me from doing so. "It's a shame Netflix isn't available in the UK then." Well no, not really. They have exactly the same barriers... Silverlight + DRM.

Okay, so let's just buy the film outright rather than streaming it using a rental service. Where can I do that? Well there don't seem to be many places that let you do that in the UK. I found something called BlinkBox, but apparently they use Windows Media DRM to "protect" the films, which again, can't be played on Linux. There is apparently a limited set of films which can be "streamed" to Linux through Adobe Flash, but that's about it.

So if you don't have Windows or OSX, and don't have Silverlight in most cases, you're stuck. And even if you do meet those requirements, you're not free to transfer them to other devices. Essentially, they won't take your money.
Now when it comes to music, things are a bit brighter. One of the biggest online shops, Amazon, allows you to buy single tracks or entire albums in MP3 format. Nearly every variation of computer and multimedia device can play MP3s. However, apparently Amazon now "watermark" many tracks with identifying information, so if they are copied by various other people, you can be identified as the origin of the copy. If you happen to have a large-capacity music device that gets lost or stolen, you'd better hope that those who end up with your device don't decide to upload all your tracks. You'd be liable and accused of enabling piracy. In any case, why would someone choose MP3 over buying the CD? The CD is higher quality, can be ripped without any watermarking, and encoded into any format you wish, including convenient lossless ones such as FLAC. Well surely they're more expensive to buy? Let's compare buying Avenged Sevenfold's self-titled album in MP3 format and CD format. On Amazon.co.uk, at the time of writing this, the MP3 album is £4.99. Bargain! Not bad at all. So surely the CD will be more expensive? Well no, it's not. In fact it's £4.49. What's the incentive to buy the lower-quality MP3 version with possible watermarking, over the cheaper and higher-quality CD version, with free delivery, cover art, rippable into any format you wish without loss in quality, and you still have the CD as backup if you lose it? I guess the only advantage is convenience, if you can't wait 24 hours for it to arrive.

I like my audio to be in FLAC format. It doesn't take up huge amounts of space, the sound is identical to the original, and plays on my media box, my music player (iAudio 7), and my laptop. Unfortunately attempting to find a place that provides a large library of music in this format has been a fruitless endeavour. There was one place that met this demand. In fact not only could you choose FLAC format, but many other formats too, such as MP3, OGG, AAC, WAV, WMA, Musepack, Monkey's Audio and OptimFROG. You could even choose the bitrate! Isn't that great?! You get your music in the format of your choosing rather than just the one format, and you didn't need any special software to download the tracks either, unlike Amazon's. So why am I referring to this place in the past-tense? It's because they were sued by the RIAA on behalf of EMI, Sony BMG, Universal Music Group and Warner Music Group and as a result had pressure from the Russian government. They were claiming $1.65 trillion in damages from 11 million songs. That's $150,000 per track. Let's leave the absolute insanity of the claimed damages because it just goes without saying. AllofMP3 eventually closed, even though there never did appear to be any real case against them. Russian law allowed them to license the music in the way they did, apparently. But even if they were completely illegal, what they did was give customers what they wanted. They made a healthy profit, and were very popular. So surely other companies would want to do likewise? No. It appears not. Rather than have you conveniently buy a track without watermarking or DRM in a format of your choice and with almost no overhead cost for them, they'd rather have you buy the physical product which needed to be manufactured individually, have printed artwork, put into a protective case, shipped to a distributor, kept in stock in a warehouse, eventually shipped to a seller, kept in their warehouse, then have staff paid to package it up and pay a delivery company to deliver it to you in person... for less.

In fact this is worse when it comes to buying eBooks on Amazon's site. Not only are they DRM-protected, but they tend to cost more than the physical printed book. The convenience they provide is that clicking on the Buy It Now button on the book's page will allow it to almost instantly appear on your device. But note how you have to get the book from Amazon, and no-one else. You can't go to another eBook shop and buy a book in an alternative format, such as the highly-popular ePub format. And the books we do buy from Amazon can never be put on a non-Amazon device. So again, why would we pay extra for the additional limitations? It's this kind of total control that makes Apple's "ecosystem" so awful and exploitative.

Now there is an alternative... you can get the film, book, music track you want, and have it almost instantly. You can play it or read it on any device you like, and there's no DRM or secret identifying information in the files. The DVD or Bluray discs won't have endless unskippable warnings about how you shouldn't copy films, or loads of trailers for films you don't ever want to see. What is this wonderful source that gives people exactly what they're so willing to pay for? Well you've probably guessed it... piracy. Yes, you don't even have to pay for it. These companies see piracy as a terrible plague that's destroying their world, yet for some reason they refuse to compete with it. I've got money to give them, but instead they say I'm not allowed to watch their films because I don't have the necessary handcuffs.

I previously bought the complete Sherlock Holmes DVD box set, and at the beginning of *each disc* was an unskippable anti-piracy video with Matrix-style music, giving warnings about how downloading films illegally is extremely bad and makes you a criminal. Not exactly setting me up to be immersed in 19th century London. Ironically, the only people who are punished by seeing these warnings are those who obtained the episodes by legal means, since those who ripped them and made them available for download remove the intrustive and annoying 30 second atmosphere-destroying clip. I had paid for these legitimately, but because of that, I'm being warned that I mustn't download illegal copies.

I'll end on an experience I had with a computer game, which I have posted elsewhere before:

Years ago I bought Sid Meier's Pirates! for Windows (back when I was using Windows). I was really looking forward to it because I had previously played Pirates Gold! on the Amiga CD32 and loved it. I wanted this game so much that I had pre-ordered it a couple months in advance for the more expensive special edition (the same game but with various extra things like a map poster, extra media etc.). When it finally arrived, I put the disc in the drive and then proceeded to run the installer. Then some kind of pre-installation checker was running (something like SecuROM... it could have even actually been SecuROM). It failed the check. I was certain I didn't have a dodgy copy as it was bought from a reputable source referenced by the company that made the game and had all the official packaging and goodies. So I tried it again... failed. I rebooted... failed. I checked that I met their system requirement. I easily exceeded every requirement. Because it was so new, searches for this problem yielded no results. It wasn't until a week later (without being able to install it) that others were reporting the same problem. Many people couldn't install the game they had bought through legitimate means for their legitimate copy. Eventually it was revealed that many drives couldn't read the disks, and these were a wide range of drive models. Atari mentioned that it was related to the speed at which the drive spun the disc, so there was a tool suggested that would artificially slow down the drive. This still didn't work for me. It turns out that the copy-protection system they used had intentional errors on the disc that the software was checking for. Theoretically, copying the disc would either automatically correct the errors, or the copy would fail to complete. Many drives clearly weren't compatible with this error-checking process.

... I eventually downloaded a pirated copy (yes, a pirated copy of Pirates!) and it installed no problem. No DRM, no checks, no installation issues. In fact it was better than the version of the game I got even after installation as it no longer needed the disc in the drive. Plus I could keep the pirated installer backed up... but not my legitimate copy.

The lesson: DRM, anti-piracy systems and intrusive unskippable warnings don't affect pirates, only paying customers. These industries wage a constant war on piracy, but they're the ones who encourage it by punishing those who don't pirate.