Tuesday 4 June 2013

1,001 chained streaming-only replication instances

On my laptop I've managed to create 1,001 local chained (one-to-one) streaming-only (meaning no archive directory) asynchronous replication instances. The output of the status of the list is here: https://gist.github.com/darkixion/5694200

I also tested the promotion of the 1st standby to see if it would cope with propagation to the final 1,000th standby, and it worked flawlessly. This didn't work on my copy of Linux Mint without some adjustments to the kernel semaphores values, and it does take a while for all the standbys in the chain to reach full recovery. However, promotion propagation is very fast.

Try it for yourself (if you have enough RAM that is). You may find it quicker to use my pg_rep_test tool. Just don't do this manually... it'll take far too long.

Thanks to Heikki for putting in the changes that made this archiveless cascading replication possible. :)

Update: some figures

So looking at the logs, it's clear why it takes so long for all 1,000 standbys to come online; it tries to connect to its replication host every 5 seconds, so the delay between the host coming online and the standby coming online is up to 5 seconds. This potentially amounts to 5,000 seconds (about 83 mins) to ensure they're all online and receiving a streaming replication connection. A test of this shows it taking 46 minutes 25 seconds.

And as requested by Jonathan Katz (@jkatz05) I can tell you that the amount of time it takes for the promotion of the 1st standby to cause the 1,000th standby to switch to the new timeline (at least on my laptop with an SSD) is 1 minute 46 seconds, so a rate of 9.266 promoted instances per second. And as for actual data changes (in the case of my test, the creation of a table), it took about 6 seconds to reach the 1,000th standby. Re-tested with an insert of a row, and it's about the same again.

Monday 27 May 2013

Accepted Google Summer of Code projects 2013

This year's accepted Google Summer of Code projects have been published. Among them are 3 accepted proposals for PostgreSQL which not only will be fine addition to PostgreSQL's set of features, but also helps introduce new students to our community for hopefully long-term contributions.

The projects that will be worked on are:

Cube extension improvement (Stas Kelvich)

Indexes on the cube data type (in the "cube" extension) tend to be on the large size, meaning maintenance of and accessing these indexes is expensive. The improvements this project aims to implement are in reducing the cost of indexes on cube data by using r-tree structures. In addition to this, PostgreSQL's relatively new K-Nearest Neighbour framework would serve to allow the creation of ordering operators for retrieving sorted data directly from the index, and ordering operators for kNN with different spatial norms.

UPDATE ... RETURNING OLD (Karol Trzcionka)

PostgreSQL can perform UPDATE statements and return the new row by using the RETURNING clause and referencing the columns you want. This project would introduce NEW and OLD aliases to provide the ability to reference not just the new row but also the old. This would allow for a before/after comparison of data.

Efficient KNN search through high-dimensional indexing with iDistance (Mike Schuh)

This will introduce a new indexing algorithm that utilises a high-dimensional space leading to more efficient K-nearest neighbour searches. Such indexes are an advantage over b-tree and r-tree which degrade with a modest increase in dimensions, whereas the iDistance algorithm has been demonstrated to remain well-performing and efficient.

Of course our students won't be left to work in isolation; they will also receive guidance from established community members specifically assigned to mentor them. We welcome Stas, Karol and Mike to the community, and hope not only that they are successful in their projects, but that they continue to contribute beyond this year's Google Summer of Code. Also thanks to Alexander Korotkov, David Fetter and Stephen Frost who will mentor this year's students. It's worth noting that Alexander was actually a GSoC student last year whose work on indexing on ranges made it into the upcoming 9.3 release.