Thursday, 25 July 2019

jsquery vs SQL/JSON

SQL/JSON is coming to PostgreSQL 12 and provides a native way to query JSON data (although, to be specific, JSON data stored as a JSONB data type).  This also introduces the jsonpath data type which is used for SQL/JSON query expressions.  I'll not be going into its usage in detail, or covering performance characteristics (at least not in this post), but I will compare the syntax and functionality with jsquery.  Note that this may potentially change prior to final release.

jsquery was introduced as a non-core extension for PostgreSQL 9.4 and higher by Teodor Sigaev, Alexander Korotkov and Oleg Bartunov.  Like jsonpath, it also used its own datatype, jsquery.

We have some functions to which we can pass jsonpath expressions to, 2 of which have operators which can be used as shorthand (albeit without additional parameter control, as they exist primarily for indexing purposes):
Function Operator Description
jsonb_path_exists @? This will return true if it matched something, false if not, or null if it resulted in an operation on a missing value.
jsonb_path_match @@ This does the same thing as jsonb_path_exists, but only first result item is tested.
jsonb_path_query None Returns the JSON data resulting from the jsonpath expression.
jsonb_path_query_array None Same as jsonb_path_query, but puts the result in a JSON array.
jsonb_path_query_first None Same as jsonb_path_query, but only selects the first value.

These will suppress errors where there's a lack of an array element, object field, an unexpected JSON type or numeric errors.

Here are some examples of how jsquery and SQL/JSON differ in practice. (Note that jsquery usage requires installation of the jsquery extension):

We will select all elements from an array which are equal to 1.  Note that, here, jsquery returns an array containing 1, whereas jsonpath returns a scalar value of 1.

jsquery
SELECT '[1,2,3]'::jsonb ~~ '#. ?($ = 1).$'::jsquery;

jsonpath
SELECT jsonb_path_query('[1,2,3]'::jsonb, '$[*] ? (@ == 1)');


Now we'll check that all elements are greater than 1 and less than 5.

jsquery
SELECT '[2,3,4]' @@ '#: ($ > 1 and $ < 5)'::jsquery;

jsonpath
SELECT '[2,3,4]' @? '$[*] ? (@ > 1 && @ < 5)';


And here we have some jsonb data as follows:
CREATE TABLE books (data jsonb);

INSERT INTO books (data) VALUES ('[{"author": "Charles Dickens", "book": "A Tale of Two Cities"},
   {"author": "William Shakespeare", "book": "Hamlet"}]');

We want books by William Shakespeare.

jsquery
SELECT data ~~ '#. ? ($.author = "William Shakespeare")' FROM books;

jsonpath
SELECT jsonb_path_query(data,'$[*] ? (@.author == "William Shakespeare")') FROM books;


You can see that, while they share some characteristics, they are not the same.  It's also possible to control whether it uses lax or strict rules which determine whether or not to throw an error if referencing a non-existing object member or a structural issue.  "lax" suppresses such errors, "strict" doesn't.  These are placed at the beginning of the jsonpath expression.

So we end up with the following syntax for jsonpath:

[lax|strict] <path expression> ? <filter expression>

I've put together some comparisons between jsquery and jsonpath expressions.

Key
- Achievable using other jsonpath operators.
* No jsonpath equivalent, but usage available at the SQL level.
x No equivalent.

Comparison of variables
jsquery jsonpath Description
$ $ The whole document
. . Accessor
* * All values at the current level
x ** All values at all levels
#N $[N] Nth value of an array starting at 0
x $[start,end] Slice of an array
# - All array elements
% - All object keys


Comparison of methods
jsquery jsonpath Description
x + (unary) Plus operation on a sequence
x - (unary) Minus operation on sequence
x + (binary) Addition
x - (binary) Subtraction
x * Multiplication
x / Division
IS <type> type() Checks the type (jsquery) or returns the type name (jsonpath)
@# size() Size (length) of an array
x double() Numeric value from string
x ceiling() Nearest integer greater than or equal to value
x floor() Nearest integer less than or equal to value
x abs() Absolute value of number
x keyvalue() Object represented as sequence of key, value and id fields


Comparison of operators
jsquery jsonpath Description
= == Equality
< < Less than
<= <= Less than or equal to
> > Greater than
>= >= Greater than or equal to
@> * Contains
<@ * Contained by
IN - Search within a list of scalar values
&& * Overlap
AND && Boolean AND
OR || Boolean OR
NOT ! Boolean NOT
=* exists Expression contains 1 or more items
starts with Value begins with specified value
x like_regex Test string against regex pattern


Special values
jsquery jsonpath
truetrue
falsefalse
nullnull
xis unknown

Note that, unlike SQL expressions, you can use the equality operator with "null", whereas you would usually have to state IS NULL.  This is because it's not directly equivalent.

jsquery features not present in SQL/JSON.

jsquery supports index hints, but this was necessary as the optimiser has no knowledge of the contents of jsquery strings or statistics related to individual values.  So this feature is effectively redundant when it comes to SQL/JSON.

Limitations

Only text, numeric and boolean types are supported at present.  Datetime is still a work in progress, so these are intended to be supported in JSONB in future.

For more information on the jsonpath and SQL/JSON, see the PostgreSQL documentation:

SQL/JSON Path Expressions
jsonpath Type

No comments: