- 2 factor authentication
- alternative vote
- common table expressions
- cory doctorow
- good old games
- mailing lists
- password strength
- with a little help
- writable common table expressions
Tuesday, 9 December 2014
Sunday, 2 March 2014
The PostgreSQL Project is yet again participating in Google Summer of Code for 2014. Google will be funding students who take part at $5,500 USD per student (see GSoC FAQ for more details). We would like to hear from students who would be willing to work on projects to add new or enhance existing features. You won't be going it alone, we'll assign experienced community developers to mentor you throughout your project.
Have a look at the TODO list on the wiki for inspiration, or the list of project ideas submitted so far by the community, although these are by no means what projects are limited to. Whether you've spotted something you think you could work on, or have a new idea to propose, submit it to the pgsql-students mailing list for discussion (see the mailing list page to subscribe) but do it soon as there's less than 3 weeks to register final proposals.
Also, if you know of any students who may be interested in participating, please send them our way.
More information can be found on the PostgreSQL GSoC page
Tuesday, 4 February 2014
Why do companies make it so hard to hand my money over to them? It's like they're running away from me and throwing obstacles in my way to stop me reaching them.
Perhaps I should elucidate a little. What I'm talking about is buying music and films. Ideally I would see something I like, and buy it. However, whilst I can technically do this, it's not practical in most cases. Here's an example:
LoveFilm like to make a big deal out of being able to stream films to your PC. "Great", I thought, "I have a PC, and I want to watch films, so I'll just plug it into my TV and give it a go." Just one problem; even though it says you can watch it on your PC, it took a bit more research to discover that what they actually meant was "A PC that can install and use Microsoft Silverlight." Now that's not what they say on their streaming page. They seem to omit that key detail. You have to dig around in the help section for the real requirements. Now I don't have Windows on any of my PCs. I either have Debian or Ubuntu, and Silverlight is not available on these platforms. You may be thinking "Ah, but Linux has the Moonlight project which is the Linux equivalent of Silverlight." That it is, but completely unusable in this case because Microsoft won't license their PlayReady DRM system to the Moonlight project, so that option's dead. Also my work laptop is a Macbook Pro, but this is for work, not for installing entertainment software on, so that's not an option. Basically DRM is the real reason I can't watch it. Because they don't want me to copy the film that I paid to watch, they insist I must have software installed that prevents me from doing so. "It's a shame Netflix isn't available in the UK then." Well no, not really. They have exactly the same barriers... Silverlight + DRM.
Okay, so let's just buy the film outright rather than streaming it using a rental service. Where can I do that? Well there don't seem to be many places that let you do that in the UK. I found something called BlinkBox, but apparently they use Windows Media DRM to "protect" the films, which again, can't be played on Linux. There is apparently a limited set of films which can be "streamed" to Linux through Adobe Flash, but that's about it.
So if you don't have Windows or OSX, and don't have Silverlight in most cases, you're stuck. And even if you do meet those requirements, you're not free to transfer them to other devices. Essentially, they won't take your money.
Now when it comes to music, things are a bit brighter. One of the biggest online shops, Amazon, allows you to buy single tracks or entire albums in MP3 format. Nearly every variation of computer and multimedia device can play MP3s. However, apparently Amazon now "watermark" many tracks with identifying information, so if they are copied by various other people, you can be identified as the origin of the copy. If you happen to have a large-capacity music device that gets lost or stolen, you'd better hope that those who end up with your device don't decide to upload all your tracks. You'd be liable and accused of enabling piracy. In any case, why would someone choose MP3 over buying the CD? The CD is higher quality, can be ripped without any watermarking, and encoded into any format you wish, including convenient lossless ones such as FLAC. Well surely they're more expensive to buy? Let's compare buying Avenged Sevenfold's self-titled album in MP3 format and CD format. On Amazon.co.uk, at the time of writing this, the MP3 album is £4.99. Bargain! Not bad at all. So surely the CD will be more expensive? Well no, it's not. In fact it's £4.49. What's the incentive to buy the lower-quality MP3 version with possible watermarking, over the cheaper and higher-quality CD version, with free delivery, cover art, rippable into any format you wish without loss in quality, and you still have the CD as backup if you lose it? I guess the only advantage is convenience, if you can't wait 24 hours for it to arrive.
I like my audio to be in FLAC format. It doesn't take up huge amounts of space, the sound is identical to the original, and plays on my media box, my music player (iAudio 7), and my laptop. Unfortunately attempting to find a place that provides a large library of music in this format has been a fruitless endeavour. There was one place that met this demand. In fact not only could you choose FLAC format, but many other formats too, such as MP3, OGG, AAC, WAV, WMA, Musepack, Monkey's Audio and OptimFROG. You could even choose the bitrate! Isn't that great?! You get your music in the format of your choosing rather than just the one format, and you didn't need any special software to download the tracks either, unlike Amazon's. So why am I referring to this place in the past-tense? It's because they were sued by the RIAA on behalf of EMI, Sony BMG, Universal Music Group and Warner Music Group and as a result had pressure from the Russian government. They were claiming $1.65 trillion in damages from 11 million songs. That's $150,000 per track. Let's leave the absolute insanity of the claimed damages because it just goes without saying. AllofMP3 eventually closed, even though there never did appear to be any real case against them. Russian law allowed them to license the music in the way they did, apparently. But even if they were completely illegal, what they did was give customers what they wanted. They made a healthy profit, and were very popular. So surely other companies would want to do likewise? No. It appears not. Rather than have you conveniently buy a track without watermarking or DRM in a format of your choice and with almost no overhead cost for them, they'd rather have you buy the physical product which needed to be manufactured individually, have printed artwork, put into a protective case, shipped to a distributor, kept in stock in a warehouse, eventually shipped to a seller, kept in their warehouse, then have staff paid to package it up and pay a delivery company to deliver it to you in person... for less.
In fact this is worse when it comes to buying eBooks on Amazon's site. Not only are they DRM-protected, but they tend to cost more than the physical printed book. The convenience they provide is that clicking on the Buy It Now button on the book's page will allow it to almost instantly appear on your device. But note how you have to get the book from Amazon, and no-one else. You can't go to another eBook shop and buy a book in an alternative format, such as the highly-popular ePub format. And the books we do buy from Amazon can never be put on a non-Amazon device. So again, why would we pay extra for the additional limitations? It's this kind of total control that makes Apple's "ecosystem" so awful and exploitative.
Now there is an alternative... you can get the film, book, music track you want, and have it almost instantly. You can play it or read it on any device you like, and there's no DRM or secret identifying information in the files. The DVD or Bluray discs won't have endless unskippable warnings about how you shouldn't copy films, or loads of trailers for films you don't ever want to see. What is this wonderful source that gives people exactly what they're so willing to pay for? Well you've probably guessed it... piracy. Yes, you don't even have to pay for it. These companies see piracy as a terrible plague that's destroying their world, yet for some reason they refuse to compete with it. I've got money to give them, but instead they say I'm not allowed to watch their films because I don't have the necessary handcuffs.
I previously bought the complete Sherlock Holmes DVD box set, and at the beginning of *each disc* was an unskippable anti-piracy video with Matrix-style music, giving warnings about how downloading films illegally is extremely bad and makes you a criminal. Not exactly setting me up to be immersed in 19th century London. Ironically, the only people who are punished by seeing these warnings are those who obtained the episodes by legal means, since those who ripped them and made them available for download remove the intrustive and annoying 30 second atmosphere-destroying clip. I had paid for these legitimately, but because of that, I'm being warned that I mustn't download illegal copies.
I'll end on an experience I had with a computer game, which I have posted elsewhere before:
Years ago I bought Sid Meier's Pirates! for Windows (back when I was using Windows). I was really looking forward to it because I had previously played Pirates Gold! on the Amiga CD32 and loved it. I wanted this game so much that I had pre-ordered it a couple months in advance for the more expensive special edition (the same game but with various extra things like a map poster, extra media etc.). When it finally arrived, I put the disc in the drive and then proceeded to run the installer. Then some kind of pre-installation checker was running (something like SecuROM... it could have even actually been SecuROM). It failed the check. I was certain I didn't have a dodgy copy as it was bought from a reputable source referenced by the company that made the game and had all the official packaging and goodies. So I tried it again... failed. I rebooted... failed. I checked that I met their system requirement. I easily exceeded every requirement. Because it was so new, searches for this problem yielded no results. It wasn't until a week later (without being able to install it) that others were reporting the same problem. Many people couldn't install the game they had bought through legitimate means for their legitimate copy. Eventually it was revealed that many drives couldn't read the disks, and these were a wide range of drive models. Atari mentioned that it was related to the speed at which the drive spun the disc, so there was a tool suggested that would artificially slow down the drive. This still didn't work for me. It turns out that the copy-protection system they used had intentional errors on the disc that the software was checking for. Theoretically, copying the disc would either automatically correct the errors, or the copy would fail to complete. Many drives clearly weren't compatible with this error-checking process.
... I eventually downloaded a pirated copy (yes, a pirated copy of Pirates!) and it installed no problem. No DRM, no checks, no installation issues. In fact it was better than the version of the game I got even after installation as it no longer needed the disc in the drive. Plus I could keep the pirated installer backed up... but not my legitimate copy.
The lesson: DRM, anti-piracy systems and intrusive unskippable warnings don't affect pirates, only paying customers. These industries wage a constant war on piracy, but they're the ones who encourage it by punishing those who don't pirate.
Tuesday, 4 June 2013
On my laptop I've managed to create 1,001 local chained (one-to-one) streaming-only (meaning no archive directory) asynchronous replication instances. The output of the status of the list is here: https://gist.github.com/darkixion/5694200
I also tested the promotion of the 1st standby to see if it would cope with propagation to the final 1,000th standby, and it worked flawlessly. This didn't work on my copy of Linux Mint without some adjustments to the kernel semaphores values, and it does take a while for all the standbys in the chain to reach full recovery. However, promotion propagation is very fast.
Try it for yourself (if you have enough RAM that is). You may find it quicker to use my pg_rep_test tool. Just don't do this manually... it'll take far too long.
Thanks to Heikki for putting in the changes that made this archiveless cascading replication possible. :)
Update: some figures
So looking at the logs, it's clear why it takes so long for all 1,000 standbys to come online; it tries to connect to its replication host every 5 seconds, so the delay between the host coming online and the standby coming online is up to 5 seconds. This potentially amounts to 5,000 seconds (about 83 mins) to ensure they're all online and receiving a streaming replication connection. A test of this shows it taking 46 minutes 25 seconds.
And as requested by Jonathan Katz (@jkatz05) I can tell you that the amount of time it takes for the promotion of the 1st standby to cause the 1,000th standby to switch to the new timeline (at least on my laptop with an SSD) is 1 minute 46 seconds, so a rate of 9.266 promoted instances per second. And as for actual data changes (in the case of my test, the creation of a table), it took about 6 seconds to reach the 1,000th standby. Re-tested with an insert of a row, and it's about the same again.
Monday, 27 May 2013
The projects that will be worked on are:
Cube extension improvement (Stas Kelvich)Indexes on the cube data type (in the "cube" extension) tend to be on the large size, meaning maintenance of and accessing these indexes is expensive. The improvements this project aims to implement are in reducing the cost of indexes on cube data by using r-tree structures. In addition to this, PostgreSQL's relatively new K-Nearest Neighbour framework would serve to allow the creation of ordering operators for retrieving sorted data directly from the index, and ordering operators for kNN with different spatial norms.
UPDATE ... RETURNING OLD (Karol Trzcionka)PostgreSQL can perform UPDATE statements and return the new row by using the RETURNING clause and referencing the columns you want. This project would introduce NEW and OLD aliases to provide the ability to reference not just the new row but also the old. This would allow for a before/after comparison of data.
Efficient KNN search through high-dimensional indexing with iDistance (Mike Schuh)This will introduce a new indexing algorithm that utilises a high-dimensional space leading to more efficient K-nearest neighbour searches. Such indexes are an advantage over b-tree and r-tree which degrade with a modest increase in dimensions, whereas the iDistance algorithm has been demonstrated to remain well-performing and efficient.
Of course our students won't be left to work in isolation; they will also receive guidance from established community members specifically assigned to mentor them. We welcome Stas, Karol and Mike to the community, and hope not only that they are successful in their projects, but that they continue to contribute beyond this year's Google Summer of Code. Also thanks to Alexander Korotkov, David Fetter and Stephen Frost who will mentor this year's students. It's worth noting that Alexander was actually a GSoC student last year whose work on indexing on ranges made it into the upcoming 9.3 release.
Monday, 2 April 2012
If you have been using PostgreSQL for a long time, or you’re relatively new but have been following old instructions about how to use it, it’s possible that you’re using features that have been deprecated. The reason features disappear tend to be because they have been superseded by better features which cover the same functionality. It’s important to try to avoid using features which are destined to disappear if there’s a newer alternative. Also when planning an upgrade, it’s useful to know if a feature you’re using will suddenly break in the new version. Some of these features still continue to work, but have since been removed from documentation because they’re to be removed in a future release. Others have just been removed completely.
We’ll start off with the absurdly old and work our way to the present:
timetravel contrib module
Does it still work?: No. This is ooooold and was last supported back in PostgreSQL 6.1.
What’s wrong with it?: This really dragged performance down and took up a huge amount of storage space. But the concept was pretty cool... being able to query data as it was at another time.
What to use instead: You can use triggers to implement a similar mechanism.
char2/char4/char8/char16 data types
Does it still work?: No. These were removed way back in PostgreSQL 6.4. In fact I shouldn’t bother mentioning these, but you never know... someone *could* still be using them somewhere.
What’s wrong with it?: Not in the SQL standard and they’re no faster than using the ubiquitous char(n).
What to use instead: char(n)
abstime data type
Does it still work?: This will still work, but it’s no longer documented as of PostgreSQL 7.0 and only intended to be used internally. Despite its name, it supports both date and time.
What’s wrong with it?: The range this data type provides is limited: ‘1901-12-14’ to ‘2038-01-19’. It also only has a resolution down to the second. Its behaviour is unfortunately like that of MySQL’s, in that if you insert an invalid value, it won’t fire an error. Instead you’ll just see ‘invalid’ as the value when you go to query it.
What to use instead: Since abstime supports timezone, the better alternative is using timestamp with time zone (timestamptz). It takes up more space (8 bytes instead of 4), but it has a far wider range: ‘4713 BC’ to ‘294276 AD’ and supports microsecond resolution.
reltime data type
Does it still work?: Yes, still works, but again, no longer documented as of PostgreSQL 7.0 and for internal use only.
What’s wrong with it?: This stores a date/time offset but only +/- 68 years. Again, this doesn’t error with values higher than this limit. Does it put ‘invalid’ in the column like abstime? No. Instead the value wraps around, so entering +70 years would result in a value of around -66 years. Not what you want. It also has a resolution down to the second.
What to use instead: The SQL standard equivalent of this kind of data type is interval, which PostgreSQL has. This does take up more space (12 bytes), but it’s range is absolutely huge: ‘-178000000 years’ to ‘+178000000 years’. This also has microsecond resolution. Interval can also handle relative time units; for example, adding a month to 15th February using interval will give you 15th March, but with abstime it has a fixed notion of a month being 30 days, so will give you 17th March (on a non-leap year). A year is also considered to be 360 days.
timespan data type
Does it still work?: No. This was deprecated back in PostgreSQL 7.0, and totally removed in PostgreSQL 7.3. If for any reason you’re using this, you’re *definitely* overdue an upgrade, and have been for many years.
What’s wrong with it?: It’s not in the SQL standard and was really just an alias for interval.
What to use instead: Just use interval.
psql/pg_dump’s -u option
Does it still work?: No, this was deprecated as far back as PostgreSQL 7.0 and removed in 8.3. You should definitely not be using this.
What’s wrong with it?: This option forced psql and pg_dump to prompt for the username and password before connecting to the database. Since prompting for a username is always optional, but prompting for a password may or may not be required (depending on authentication method), it didn’t make sense to glue both of these together.
What to use instead: It has been replaced by the -U option to specify the username, and the -W option to prompt for the password.
Does it still work?: Yes, but it’s effectively deprecated as of PostgreSQL 7.1!
What’s wrong with it?: It’s no longer documented, and could be removed in a future release since it’s obsolete.
What to use instead: Call current_user instead, since getpgusername() is now just an alias for that.
autovacuum contrib module
Does it still work?: No, as it was moved into core since PostgreSQL 8.1.
What’s wrong with it?: Nothing. Quite the opposite. It was considered so essential that it became part of the main codebase.
What to use instead: Nothing to worry about. Since it’s now in core, you get it out of the box without having to explicitly include it.
mSQL-interface and tips contrib modules
Does it still work?: No, these were completely removed in PostgreSQL 8.2.
What’s wrong with it?: These were considered abandoned and unmaintained.
What to use instead: Nothing.
adddepend, dbase, dbmirror, fulltextindex, mac, ora2pg and userlock
Does it still work?: No, again, these were completely removed in PostgreSQL 8.2.
What’s wrong with it?: Most of these were moved to pgFoundry to be maintained separately.
What to use instead: These still exist on pgFoundry if you really want them (except for fulltextindex which has disappeared, and ora2pg which is on its own website), although they’re all now unmaintained (apart from ora2pg).
automatic casting to text
Does it still work?: As of PostgreSQL 8.3, non-text data types are no longer implicitly cast to text. This is considered to be one of the major hurdles for some people migrating from earlier versions and the biggest cause of incompatibility.
What’s wrong with it?: Anyone who knows PostgreSQL well will know that it doesn’t like to throw any weirdness or odd behaviour your way. There are cases where implicit casting to text causes undesired results. For example: current_date < 2012-04-02 would result in both sides being automatically cast to text types, even though the date on the right-hand side would first be considered an integer (2012 minus 4 minus 2).
What to use instead: It’s always good practise to be explicit about data types when specifying literals. This will avoid any usual behaviour.
tsearch2 contrib module
Does it still work?: Yes, but it is deprecated as of PostgreSQL 8.3.
What’s wrong with it?: It has been superseded by changes in core with a few functional changes. It’s still kept around for backwards-compatibility.
What to use instead: Use the newer core functionality. There’s information on the tsearch2 contrib module page in the documentation on how to convert to the new functionality.
xml2 contrib module
Does it still work?: Yes, but it is deprecated as of PostgreSQL 8.3.
What’s wrong with it?: Nothing really, and it’s still around for backwards-compatibility, but there is newer XML functionality in core based on the SQL/XML standard.
What to use instead: Use the built-in XML features (xml data type, xml functions, xml parameters).
pg_dump/pg_dumpall’s -d and -D options
Does it still work?: No, these were removed in PostgreSQL 8.4.
What’s wrong with it?: Such options were often mistaken for a database name parameter, but in fact it caused database dumps to output using insert statements rather than copy statements. This is significantly slower to restore, and cannot be adjusted after the fact.
What to use instead: If someone really did want to use these options intentionally, then the long name options of --inserts and --column-inserts are to be used instead.
createlang/droplang client applications
Does it still work?: Only up until PostgreSQL 9.1.
What’s wrong with it?: Languages are now treated like extensions as of PostgreSQL 9.1.
What to use instead: Execute CREATE EXTENSION <language name> instead.
Does it still work?: Yes, but it’s no longer intended to be used by users, only extensions.
What’s wrong with it?: Languages are now considered to be extensions as of PostgreSQL 9.1.
What to use instead: You can install new languages by installing it as an extension with CREATE EXTENSION <language name>. And to remove it, use the DROP equivalent. If you’ve upgraded your cluster to 9.1 or above from a previous version, you will still have the language installed but not as an extension. You can, however, convert it to an extension by using: CREATE EXTENSION <language name> FROM unpackaged. You can then remove it later with DROP EXTENSION.
Does it still work?: Only up to PostgreSQL 9.1, but as of PostgreSQL 9.2, no, at least as far as hstore is concerned. This was actually deprecated in 9.0 and has emitted warnings about using it since then. This is most notably used in the hstore extension. You can still create this operator, but it will return a warning when you do so. Be prepared for this to be completely disallowed in a future release.
What’s wrong with it?: “=>” is reserved in the SQL standard for named function parameters, so it needs to be available for such functionality.
What to use instead: If you’ve been using this in hstore, then it will require changing text=>text to hstore(text,text). If you’ve been using it as a custom operator, you should change it to something else as at some point it will be prohibited.
Literal language name case-sensitivity
Does it still work?: If you're on 9.1 or below, yes, but as of 9.2 you won't get away with this anymore.
What’s wrong with it?: Language names should be treated like an identifier rather than a literal, and in general, literals are case-sensitive. There was special code that case-folded the language name so that the letter casing didn't matter, but this change is the first step in removing string literals as language names altogether.
What to use instead: Just don't use single quotes around language names when writing your functions at all, rather than just lower-casing them. Either use no quotes (as they're not needed for any core language) or use double-quotes as you would with any other identifier.
There are also lots of configuration parameters that have been removed, and here they are:
|australian_timezones||8.2||Better generalised timezone configuration|
|preload_libraries||8.2||Renamed to shared_preload_libraries|
|bgwriter_all_percent||8.3||No longer necessary|
|bgwriter_all_maxpages||8.3||No longer necessary|
|bgwriter_lru_percent||8.3||No longer necessary|
|redirect_stderr||8.3||Renamed to logging_collector|
|stats_block_level||8.3||Now covered by track_counts|
|stats_command_string||8.3||Renamed to track_activities|
|stats_reset_on_server_start||8.3||pg_stat_reset() can be used instead|
|stats_row_level||8.3||Now covered by track_counts|
|stats_start_collector||8.3||Now always enabled|
|explain_pretty_print||8.4||No longer needed|
|max_fsm_pages||8.4||No longer needed as per-relation free space maps deal with this.|
|max_fsm_relations||8.4||No longer needed as per-relation free space maps deal with this.|
|add_missing_from||9.0||Always defaulted to ‘off’ so now permanently off.|
|regex_flavor||9.0||Always defaulted to ‘advanced’ so now permanently set to this.|
|custom_variable_classes||9.2||Considered better to remove it as it only causes more maintenance with minimal benefit.|
|silent_mode||9.2||Not necessary as can be achieved with pg_ctl -l or NOHUP.|
|wal_sender_delay||9.2||New latch infrastructure has now made this setting redundant.|