From 9a47ad1d2006a9bdc6ad7cdb70bba054c4de1366 Mon Sep 17 00:00:00 2001 From: Bruce Momjian Date: Thu, 5 Jan 2006 17:28:45 +0000 Subject: [PATCH] Add logging control TODO.detail. --- doc/TODO.detail/walcontrol | 3321 ++++++++++++++++++++++++++++++++++++ 1 file changed, 3321 insertions(+) create mode 100644 doc/TODO.detail/walcontrol diff --git a/doc/TODO.detail/walcontrol b/doc/TODO.detail/walcontrol new file mode 100644 index 0000000000..6e088d68b6 --- /dev/null +++ b/doc/TODO.detail/walcontrol @@ -0,0 +1,3321 @@ +From pgsql-hackers-owner+M77861=pgman=candle.pha.pa.us@postgresql.org Fri Dec 23 05:19:20 2005 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +From: Simon Riggs +To: Stephen Frost +cc: Martijn van Oosterhout , + Jim C. Nasby , + bizgres-general , + pgsql-hackers@postgresql.org +In-Reply-To: <20051222223625.GC6026@ns.snowman.net> +References: <1135261893.2964.502.camel@localhost.localdomain> + <20051222183751.GG72143@pervasive.com> <20051222201826.GH21783@svana.org> + <1135289583.2964.536.camel@localhost.localdomain> + <20051222223625.GC6026@ns.snowman.net> +Date: Fri, 23 Dec 2005 10:18:43 +0000 +Message-ID: <1135333123.2964.589.camel@localhost.localdomain> +X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.1 required=5 tests=[AWL=0.100] +X-Spam-Score: 0.1 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 4728 + +On Thu, 2005-12-22 at 17:36 -0500, Stephen Frost wrote: +> * Simon Riggs (simon@2ndquadrant.com) wrote: +> > On Thu, 2005-12-22 at 21:18 +0100, Martijn van Oosterhout wrote: +> > > Considering "WAL bypass" is code for "breaks PITR" +> > +> > No it isn't. All of the WAL bypass logic does *not* operate when PITR is +> > active. The WAL bypass logic is aimed at Data Warehouses, which +> > typically never operate in PITR mode for performance reasons, however +> > the choice is yours. + +OK, thanks for saying all of that; you probably speak for many in +raising these concerns. I'll answer each bit as we come to it. Suffice +to say, your concerns are good and so are the answers: + +> Eh? PITR mode is bad for performance? Maybe I missed something but I +> wouldn't have thought PITR would degrade regular performance all that +> badly. + +PITR mode is *not* bad for performance. On a very heavily loaded +write-intensive test system, the general PITR overhead on regular +performance was around 1% - so almost negligible. + +We have been discussing a number of optimizations to specific commands +that would allow them to avoid writing WAL and thus speed up their +performance. If archive_command is set then WAL will always be written; +if it is not set then these commands will (or could) go faster: + +- CREATE TABLE AS SELECT (in 8.1) +- COPY LOCK (patch submitted) +- COPY in same transaction as CREATE TABLE (patch submitted) +- INSERT SELECT in same transaction as CREATE TABLE (this discussion) + +(There are a number of other conditions also, such as there must be no +indexes on a table. All of which now documented with the patch) + +> So long as it doesn't take 15 minutes or some such to move the +> WAL to somewhere else (and I'm not sure that'd even slow things down..). +> For a Data Warehouse, have you got a better way of doing backups such +> that you don't lose at minimum most of a day's work? + +Yes. Don't just use the backup facilities on their own. Think about how +the architecture of your systems will work and see if there is a better +way when you look at very large systems. + +> I'm not exactly a +> big fan do doing a pg_dump every night either given that the database is +> 360GB. Much nicer to take a weekly dump of the database and then do +> PITR for a week or two before taking another dump of the db. + +e.g. Keep your reference data (low volume) in an Operational Data Store +(ODS) database, protected by archiving. Keep your main fact data (high +volume) in the Data Warehouse, but save the data in slices as you load +it, so that a recovery is simply a reload of the database: no PITR or +pg_dump required, so high performance data transformation and load work +is possible. This is a commonly used architectural design pattern. + +> I like the idea of making COPY go faster, but please don't break my +> backup system while you're at it. + +On a personal note, I would only add that I spent a long time working on +PITR and I would never design anything that would intentionally break it +(nor would patches be accepted that did that). That probably gives me +the confidence to approach designs that might look like I'm doing that, +but without actually straying over the edge. + +> I'm honestly kind of nervous about +> what you mean by checking it PITR is active- how is that done, exactly? +> Check if you have a script set to rotate the logs elsewhere? Or is it +> checking if you're in the taking-a-full-database-backup stage? Or what? + +Internally, we use XLogArchivingActive(). Externally this will be set +when the admin sets archive_command to a particular value. + +My original preference was for a parameter called archive_mode= ON | OFF +which would allow us to more easily discuss this, but this does not +currently exist. + +> What's the performance decrease when using PITR, and what's it from? Is +> it just that COPY isn't as fast? Honestly, I could live with COPY being +> not as fast as it could be if my backups work. :) + +These commands will not be optimized for speed when archive_command is set: +- CREATE TABLE AS SELECT (in 8.1) +- COPY LOCK (patch submitted) + +> Sorry for sounding concerned but, well, backups are very important and +> so is performance and I'm afraid either I've not read all the +> documentation about the issues being discussed here or there isn't +> enough out there to make sense of it all yet. :) + +If you choose PITR, then you are safe. If you do not, the crash recovery +of the database is not endangered by these optimizations. + +Hope that covers all of your concerns? + +I'm just writing a course that explains many of these techniques, +available in the New Year. + +Best Regards, Simon Riggs + + +---------------------------(end of broadcast)--------------------------- +TIP 2: Don't 'kill -9' the postmaster + +From pgsql-hackers-owner+M78004=pgman=candle.pha.pa.us@postgresql.org Wed Dec 28 20:59:03 2005 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +From: Bruce Momjian +Message-ID: <200512290158.jBT1wEK28785@candle.pha.pa.us> +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +In-Reply-To: <20051226122206.GA12934@svana.org> +To: Martijn van Oosterhout +Date: Wed, 28 Dec 2005 20:58:14 -0500 (EST) +cc: Simon Riggs , Tom Lane , + Greg Stark , Rod Taylor , + Qingqing Zhou , pgsql-hackers@postgresql.org +X-Mailer: ELM [version 2.4ME+ PL121 (25)] +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.122 required=5 tests=[AWL=0.122] +X-Spam-Score: 0.122 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 3461 + + +Having read through this thread, I would like to propose a +syntax/behavior. + +I think we all now agree that the logging is more part of the table than +the command itself. Right now we have a COPY LOCK patch, but people are +going to want to control logging for INSERT INTO ... SELECT, and UPDATE, +and all sorts of other things, so I think we are best adding an ALTER +TABLE capability. I am thinking of this syntax: + + ALTER TABLE name RELIABILITY option + +where "option" is: + + DROP [ TABLE ON CRASH ] + DELETE [ ROWS ON CRASH ] + EXCLUSIVE + SHARE + +Let me explain each option. DROP would drop the table on a restart +after a non-clean shutdown. It would do _no_ logging on the table and +allow concurrent access, plus index access. DELETE is the same as DROP, +but it just truncates the table (perhaps TRUNCATE is a better word). + +EXCLUSIVE would allow only a single session to modify the table, and +would do all changes by appending to the table, similar to COPY LOCK. +EXCLUSIVE would also not allow indexes because those can not be isolated +like appending to the heap. EXCLUSIVE would write all dirty shared +buffers for the table and fsync them before committing. SHARE is the +functionality we have now, with full logging. + +Does this get us any closer to a TODO item? It isn't great, but I think +it is pretty clear, and I assume pg_dump would use ALTER to load each +table. The advanage is that the COPY statements themselves are +unchanged so they would work in loading into older versions of +PostgreSQL. + +--------------------------------------------------------------------------- + +Martijn van Oosterhout wrote: +-- Start of PGP signed section. +> On Mon, Dec 26, 2005 at 12:03:27PM +0000, Simon Riggs wrote: +> > I would not be against such a table-level switch, but the exact +> > behaviour would need to be specified more closely before this became a +> > TODO item, IMHO. +> +> Well, I think at a per table level is the only sensible level. If a +> table isn't logged, neither are the indexes. After an unclean shutdown +> the data could be anywhere between OK and rubbish, with no way of +> finding out which way. +> +> > If someone has a 100 GB table, they would not appreciate the table being +> > truncated if a transaction to load 1 GB of data aborts, forcing recovery +> > of the 100 GB table. +> +> Ah, but wouldn't such a large table be partitioned in such a way that +> you could have the most recent partition having the loaded data. +> Personally, I think these "shared temp tables" have more applications +> than meet the eye. I've had systems with cache tables which could be +> wiped on boot. Though I think my preference would be to TRUNCATE rather +> than DROP on unclean shutdown. +> +> Have a nice day, +> -- +> Martijn van Oosterhout http://svana.org/kleptog/ +> > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a +> > tool for doing 5% of the work and then sitting around waiting for someone +> > else to do the other 95% so you can sue them. +-- End of PGP section, PGP failed! + +-- + Bruce Momjian | http://candle.pha.pa.us + pgman@candle.pha.pa.us | (610) 359-1001 + + If your life is a hard drive, | 13 Roberts Road + + Christ can be your backup. | Newtown Square, Pennsylvania 19073 + +---------------------------(end of broadcast)--------------------------- +TIP 3: Have you checked our extensive FAQ? + + http://www.postgresql.org/docs/faq + +From pgsql-hackers-owner+M78007=pgman=candle.pha.pa.us@postgresql.org Wed Dec 28 22:06:13 2005 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +Message-ID: <43B3527A.4040709@commandprompt.com> +Date: Wed, 28 Dec 2005 19:05:30 -0800 +From: Joshua D. Drake +Organization: Command Prompt, Inc. +User-Agent: Mozilla Thunderbird 1.0.2 (Windows/20050317) +X-Accept-Language: en-us, en +To: Bruce Momjian +cc: Martijn van Oosterhout , + Simon Riggs , Tom Lane , + Greg Stark , Rod Taylor , + Qingqing Zhou , pgsql-hackers@postgresql.org +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +References: <200512290158.jBT1wEK28785@candle.pha.pa.us> +In-Reply-To: <200512290158.jBT1wEK28785@candle.pha.pa.us> +X-Greylist: Sender succeded SMTP AUTH authentication, not delayed by milter-greylist-1.6 (hosting.commandprompt.com [192.168.1.101]); Wed, 28 Dec 2005 18:57:25 -0800 (PST) +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.05 required=5 tests=[AWL=0.050, UPPERCASE_25_50=0] +X-Spam-Score: 0.05 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 725 + + now agree that the logging is more part of the table than +> the command itself. Right now we have a COPY LOCK patch, but people are +> going to want to control logging for INSERT INTO ... SELECT, and UPDATE, +> and all sorts of other things, so I think we are best adding an ALTER +> TABLE capability. I am thinking of this syntax: +> +> ALTER TABLE name RELIABILITY option +> +> where "option" is: +> +> DROP [ TABLE ON CRASH ] +> DELETE [ ROWS ON CRASH ] +> EXCLUSIVE +> SHARE + +I would say ON FAILURE (Crash just seems way to scary :)) + +Joshua D. Drake + + +---------------------------(end of broadcast)--------------------------- +TIP 3: Have you checked our extensive FAQ? + + http://www.postgresql.org/docs/faq + +From pgsql-hackers-owner+M78008=pgman=candle.pha.pa.us@postgresql.org Wed Dec 28 23:09:58 2005 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +From: Bruce Momjian +Message-ID: <200512290409.jBT49LD13611@candle.pha.pa.us> +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +In-Reply-To: <43B3527A.4040709@commandprompt.com> +To: Joshua D. Drake +Date: Wed, 28 Dec 2005 23:09:21 -0500 (EST) +cc: Martijn van Oosterhout , + Simon Riggs , Tom Lane , + Greg Stark , Rod Taylor , + Qingqing Zhou , pgsql-hackers@postgresql.org +X-Mailer: ELM [version 2.4ME+ PL121 (25)] +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.122 required=5 tests=[AWL=0.122, UPPERCASE_25_50=0] +X-Spam-Score: 0.122 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 1111 + +Joshua D. Drake wrote: +> now agree that the logging is more part of the table than +> > the command itself. Right now we have a COPY LOCK patch, but people are +> > going to want to control logging for INSERT INTO ... SELECT, and UPDATE, +> > and all sorts of other things, so I think we are best adding an ALTER +> > TABLE capability. I am thinking of this syntax: +> > +> > ALTER TABLE name RELIABILITY option +> > +> > where "option" is: +> > +> > DROP [ TABLE ON CRASH ] +> > DELETE [ ROWS ON CRASH ] +> > EXCLUSIVE +> > SHARE +> +> I would say ON FAILURE (Crash just seems way to scary :)) + +Agreed, maybe ON RECOVERY. + +-- + Bruce Momjian | http://candle.pha.pa.us + pgman@candle.pha.pa.us | (610) 359-1001 + + If your life is a hard drive, | 13 Roberts Road + + Christ can be your backup. | Newtown Square, Pennsylvania 19073 + +---------------------------(end of broadcast)--------------------------- +TIP 9: In versions below 8.0, the planner will ignore your desire to + choose an index scan if your joining column's datatypes do not + match + +From simon@2ndquadrant.com Thu Dec 29 08:19:47 2005 +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +From: Simon Riggs +To: Bruce Momjian +cc: Martijn van Oosterhout , Tom Lane , + Greg Stark , Rod Taylor , + Qingqing Zhou , pgsql-hackers@postgresql.org +In-Reply-To: <200512290158.jBT1wEK28785@candle.pha.pa.us> +References: <200512290158.jBT1wEK28785@candle.pha.pa.us> +Date: Thu, 29 Dec 2005 13:19:45 +0000 +Message-ID: <1135862385.2964.804.camel@localhost.localdomain> +X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) +Content-Length: 7026 + +On Wed, 2005-12-28 at 20:58 -0500, Bruce Momjian wrote: +> Having read through this thread, I would like to propose a +> syntax/behavior. +> +> I think we all now agree that the logging is more part of the table than +> the command itself. Right now we have a COPY LOCK patch, but people are +> going to want to control logging for INSERT INTO ... SELECT, and UPDATE, +> and all sorts of other things, so I think we are best adding an ALTER +> TABLE capability. I am thinking of this syntax: +> +> ALTER TABLE name RELIABILITY option +> +> where "option" is: +> +> DROP [ TABLE ON CRASH ] +> DELETE [ ROWS ON CRASH ] +> EXCLUSIVE +> SHARE +> +> Let me explain each option. DROP would drop the table on a restart +> after a non-clean shutdown. It would do _no_ logging on the table and +> allow concurrent access, plus index access. DELETE is the same as DROP, +> but it just truncates the table (perhaps TRUNCATE is a better word). +> +> EXCLUSIVE would allow only a single session to modify the table, and +> would do all changes by appending to the table, similar to COPY LOCK. +> EXCLUSIVE would also not allow indexes because those can not be isolated +> like appending to the heap. EXCLUSIVE would write all dirty shared +> buffers for the table and fsync them before committing. SHARE is the +> functionality we have now, with full logging. +> +> Does this get us any closer to a TODO item? It isn't great, but I think +> it is pretty clear, and I assume pg_dump would use ALTER to load each +> table. The advanage is that the COPY statements themselves are +> unchanged so they would work in loading into older versions of +> PostgreSQL. + +First off, thanks for summarising a complex thread. + +My view would be that this thread has been complex because everybody has +expressed a somewhat different requirement, which could be broken down +as: +1. The need for a multi-user-accessible yet temporary table +2. Loading data into a table immediately after it is created (i.e. in +same transaction), including but not limited to a reload from pg_dump +3. How to load data quickly into an existing table (COPY) +4. How to add/modify data quickly in an existing table (INSERT SELECT, +UPDATE) + +I can see the need for all of those individually; my existing patch +submission covers (2) and (3) only. I very much like your thought to +coalesce these various requirements into a single coherent model. + +For requirement (1), table level options make sense. We would: +- CREATE TABLE ALLTHINGS +- ALTER TABLE ALLTHINGS RELIABILITY DELETE ROWS ON RECOVERY +- lots of SQL, all fast because not logged + +(2) is catered for adequately by the existing COPY patch i.e. it will +detect whether a table has just been created and then avoid writing WAL. +In the patch, pg_dump has *not* been altered to use COPY LOCK, so a +pg_dump *will* work with any other version of PostgreSQL, which *would +not* be the case if we added ALTER TABLE ... RELIABILITY statements into +it. Also, a pg_dump created at an earlier version could also be loaded +faster using the patch. The only requirement is to issue all SQL as part +of the same transaction - which is catered for by the +--single-transaction option on pg_restore and psql. So (2) is catered +for fully without the need for an ALTER TABLE ... RELIABILITY statement +or COPY LOCK. + +For requirement (3), I would use table level options like this: +(the table already exists and is reasonably big; we should not assume +that everybody can and does use partitioning) +- ALTER TABLE RELIABILITY ALLTHINGS2 EXCLUSIVE +- COPY +- ALTER TABLE RELIABILITY ALLTHINGS2 SHARE + +For a load into an existing table I would always do all three actions +together. COPY LOCK does exactly that *and* does it atomically. + +The two ways of doing (3) have a few pros/cons either way: +Pro for ALTER TABLE: +- same syntax as req (1) +- doesn't need the keyword LOCK +- allows INSERT SELECT, UPDATE operations also (req 4) +Cons: +- existing programs have to add additional statements to take advantage +of this; with COPY LOCK we would add just a single keyword +- operation is not atomic, which might lead to some operations waiting +for a lock to operate as unlogged, since they would execute before the +second ALTER TABLE gets there +- operation will be understood by some, but not others. They will forget +to switch the RELIABILITY back on and then lose their whole table when +the database crashes. (watch...) + +...but would it be a problem to have both? + + +So, my thinking would be to separate things into two: +a) Add a TODO item "shared temp tables" that caters for (1) and (4) + + ALTER TABLE name RELIABILITY + {DELETE ROWS AT RECOVERY | FULL RECOVERY} +(syntax TBD) + +which would +- truncate all rows and remove all index entries during recovery +- use shared_buffers, not temp_buffers +- never write xlog records, even when in PITR mode +- would avoid writing WAL for both heap *and* index tuples + +b) Leave the COPY patch as is, since it caters for reqs (2) and (3) as +*separate* optimizations (but using a common infrastructure in code). +[This work was based upon discussions on -hackers only 6 months ago, so +its not like its been snuck in or anything +http://archives.postgresql.org/pgsql-hackers/2005-06/msg00069.php +http://archives.postgresql.org/pgsql-hackers/2005-06/msg00075.php ] + +These two thoughts are separable. There is no need to +have-both-or-neither within PostgreSQL. + +Eventually, I'd like all of these options, as a database designer. + +Best Regards, Simon Riggs + +> -------------------------------------------------------------------------- +> +> Martijn van Oosterhout wrote: +> -- Start of PGP signed section. +> > On Mon, Dec 26, 2005 at 12:03:27PM +0000, Simon Riggs wrote: +> > > I would not be against such a table-level switch, but the exact +> > > behaviour would need to be specified more closely before this became a +> > > TODO item, IMHO. +> > +> > Well, I think at a per table level is the only sensible level. If a +> > table isn't logged, neither are the indexes. After an unclean shutdown +> > the data could be anywhere between OK and rubbish, with no way of +> > finding out which way. +> > +> > > If someone has a 100 GB table, they would not appreciate the table being +> > > truncated if a transaction to load 1 GB of data aborts, forcing recovery +> > > of the 100 GB table. +> > +> > Ah, but wouldn't such a large table be partitioned in such a way that +> > you could have the most recent partition having the loaded data. +> > Personally, I think these "shared temp tables" have more applications +> > than meet the eye. I've had systems with cache tables which could be +> > wiped on boot. Though I think my preference would be to TRUNCATE rather +> > than DROP on unclean shutdown. +> > +> > Have a nice day, +> > -- +> > Martijn van Oosterhout http://svana.org/kleptog/ +> > > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a +> > > tool for doing 5% of the work and then sitting around waiting for someone +> > > else to do the other 95% so you can sue them. +> -- End of PGP section, PGP failed! +> + +From pgsql-hackers-owner+M78019=pgman=candle.pha.pa.us@postgresql.org Thu Dec 29 08:20:11 2005 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +From: Simon Riggs +To: Bruce Momjian +cc: Martijn van Oosterhout , Tom Lane , + Greg Stark , Rod Taylor , + Qingqing Zhou , pgsql-hackers@postgresql.org +In-Reply-To: <200512290158.jBT1wEK28785@candle.pha.pa.us> +References: <200512290158.jBT1wEK28785@candle.pha.pa.us> +Date: Thu, 29 Dec 2005 13:19:45 +0000 +Message-ID: <1135862385.2964.804.camel@localhost.localdomain> +X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.112 required=5 tests=[AWL=0.112] +X-Spam-Score: 0.112 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 7139 + +On Wed, 2005-12-28 at 20:58 -0500, Bruce Momjian wrote: +> Having read through this thread, I would like to propose a +> syntax/behavior. +> +> I think we all now agree that the logging is more part of the table than +> the command itself. Right now we have a COPY LOCK patch, but people are +> going to want to control logging for INSERT INTO ... SELECT, and UPDATE, +> and all sorts of other things, so I think we are best adding an ALTER +> TABLE capability. I am thinking of this syntax: +> +> ALTER TABLE name RELIABILITY option +> +> where "option" is: +> +> DROP [ TABLE ON CRASH ] +> DELETE [ ROWS ON CRASH ] +> EXCLUSIVE +> SHARE +> +> Let me explain each option. DROP would drop the table on a restart +> after a non-clean shutdown. It would do _no_ logging on the table and +> allow concurrent access, plus index access. DELETE is the same as DROP, +> but it just truncates the table (perhaps TRUNCATE is a better word). +> +> EXCLUSIVE would allow only a single session to modify the table, and +> would do all changes by appending to the table, similar to COPY LOCK. +> EXCLUSIVE would also not allow indexes because those can not be isolated +> like appending to the heap. EXCLUSIVE would write all dirty shared +> buffers for the table and fsync them before committing. SHARE is the +> functionality we have now, with full logging. +> +> Does this get us any closer to a TODO item? It isn't great, but I think +> it is pretty clear, and I assume pg_dump would use ALTER to load each +> table. The advanage is that the COPY statements themselves are +> unchanged so they would work in loading into older versions of +> PostgreSQL. + +First off, thanks for summarising a complex thread. + +My view would be that this thread has been complex because everybody has +expressed a somewhat different requirement, which could be broken down +as: +1. The need for a multi-user-accessible yet temporary table +2. Loading data into a table immediately after it is created (i.e. in +same transaction), including but not limited to a reload from pg_dump +3. How to load data quickly into an existing table (COPY) +4. How to add/modify data quickly in an existing table (INSERT SELECT, +UPDATE) + +I can see the need for all of those individually; my existing patch +submission covers (2) and (3) only. I very much like your thought to +coalesce these various requirements into a single coherent model. + +For requirement (1), table level options make sense. We would: +- CREATE TABLE ALLTHINGS +- ALTER TABLE ALLTHINGS RELIABILITY DELETE ROWS ON RECOVERY +- lots of SQL, all fast because not logged + +(2) is catered for adequately by the existing COPY patch i.e. it will +detect whether a table has just been created and then avoid writing WAL. +In the patch, pg_dump has *not* been altered to use COPY LOCK, so a +pg_dump *will* work with any other version of PostgreSQL, which *would +not* be the case if we added ALTER TABLE ... RELIABILITY statements into +it. Also, a pg_dump created at an earlier version could also be loaded +faster using the patch. The only requirement is to issue all SQL as part +of the same transaction - which is catered for by the +--single-transaction option on pg_restore and psql. So (2) is catered +for fully without the need for an ALTER TABLE ... RELIABILITY statement +or COPY LOCK. + +For requirement (3), I would use table level options like this: +(the table already exists and is reasonably big; we should not assume +that everybody can and does use partitioning) +- ALTER TABLE RELIABILITY ALLTHINGS2 EXCLUSIVE +- COPY +- ALTER TABLE RELIABILITY ALLTHINGS2 SHARE + +For a load into an existing table I would always do all three actions +together. COPY LOCK does exactly that *and* does it atomically. + +The two ways of doing (3) have a few pros/cons either way: +Pro for ALTER TABLE: +- same syntax as req (1) +- doesn't need the keyword LOCK +- allows INSERT SELECT, UPDATE operations also (req 4) +Cons: +- existing programs have to add additional statements to take advantage +of this; with COPY LOCK we would add just a single keyword +- operation is not atomic, which might lead to some operations waiting +for a lock to operate as unlogged, since they would execute before the +second ALTER TABLE gets there +- operation will be understood by some, but not others. They will forget +to switch the RELIABILITY back on and then lose their whole table when +the database crashes. (watch...) + +...but would it be a problem to have both? + + +So, my thinking would be to separate things into two: +a) Add a TODO item "shared temp tables" that caters for (1) and (4) + + ALTER TABLE name RELIABILITY + {DELETE ROWS AT RECOVERY | FULL RECOVERY} +(syntax TBD) + +which would +- truncate all rows and remove all index entries during recovery +- use shared_buffers, not temp_buffers +- never write xlog records, even when in PITR mode +- would avoid writing WAL for both heap *and* index tuples + +b) Leave the COPY patch as is, since it caters for reqs (2) and (3) as +*separate* optimizations (but using a common infrastructure in code). +[This work was based upon discussions on -hackers only 6 months ago, so +its not like its been snuck in or anything +http://archives.postgresql.org/pgsql-hackers/2005-06/msg00069.php +http://archives.postgresql.org/pgsql-hackers/2005-06/msg00075.php ] + +These two thoughts are separable. There is no need to +have-both-or-neither within PostgreSQL. + +Eventually, I'd like all of these options, as a database designer. + +Best Regards, Simon Riggs + +> -------------------------------------------------------------------------- +> +> Martijn van Oosterhout wrote: +> -- Start of PGP signed section. +> > On Mon, Dec 26, 2005 at 12:03:27PM +0000, Simon Riggs wrote: +> > > I would not be against such a table-level switch, but the exact +> > > behaviour would need to be specified more closely before this became a +> > > TODO item, IMHO. +> > +> > Well, I think at a per table level is the only sensible level. If a +> > table isn't logged, neither are the indexes. After an unclean shutdown +> > the data could be anywhere between OK and rubbish, with no way of +> > finding out which way. +> > +> > > If someone has a 100 GB table, they would not appreciate the table being +> > > truncated if a transaction to load 1 GB of data aborts, forcing recovery +> > > of the 100 GB table. +> > +> > Ah, but wouldn't such a large table be partitioned in such a way that +> > you could have the most recent partition having the loaded data. +> > Personally, I think these "shared temp tables" have more applications +> > than meet the eye. I've had systems with cache tables which could be +> > wiped on boot. Though I think my preference would be to TRUNCATE rather +> > than DROP on unclean shutdown. +> > +> > Have a nice day, +> > -- +> > Martijn van Oosterhout http://svana.org/kleptog/ +> > > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a +> > > tool for doing 5% of the work and then sitting around waiting for someone +> > > else to do the other 95% so you can sue them. +> -- End of PGP section, PGP failed! +> + + +---------------------------(end of broadcast)--------------------------- +TIP 6: explain analyze is your friend + +From pgsql-hackers-owner+M78021=pgman=candle.pha.pa.us@postgresql.org Thu Dec 29 09:35:58 2005 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +From: Rod Taylor +To: Simon Riggs +cc: Bruce Momjian , + Martijn van Oosterhout , Tom Lane , + Greg Stark , Qingqing Zhou , + pgsql-hackers@postgresql.org +In-Reply-To: <1135862385.2964.804.camel@localhost.localdomain> +References: <200512290158.jBT1wEK28785@candle.pha.pa.us> + <1135862385.2964.804.camel@localhost.localdomain> +Date: Thu, 29 Dec 2005 09:35:27 -0500 +Message-ID: <1135866927.61038.13.camel@home> +X-Mailer: Evolution 2.4.2.1 FreeBSD GNOME Team Port +X-SA-Exim-Mail-From: pg@rbt.ca +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +X-SA-Exim-Version: 3.1 (built Tue Feb 24 05:09:27 GMT 2004) +X-SA-Exim-Scanned: Yes +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.024 required=5 tests=[AWL=0.024, UPPERCASE_25_50=0] +X-Spam-Score: 0.024 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 506 + + +> So, my thinking would be to separate things into two: +> a) Add a TODO item "shared temp tables" that caters for (1) and (4) +> +> ALTER TABLE name RELIABILITY +> {DELETE ROWS AT RECOVERY | FULL RECOVERY} +> (syntax TBD) + +DELETE ROWS AT RECOVERY would need to be careful or disallowed when +referenced via a foreign key to ensure the database is not restored in +an inconsistent state. + +-- + + +---------------------------(end of broadcast)--------------------------- +TIP 2: Don't 'kill -9' the postmaster + +From pg@rbt.ca Thu Dec 29 09:35:35 2005 +From: Rod Taylor +To: Simon Riggs +cc: Bruce Momjian , + Martijn van Oosterhout , Tom Lane , + Greg Stark , Qingqing Zhou , + pgsql-hackers@postgresql.org +In-Reply-To: <1135862385.2964.804.camel@localhost.localdomain> +References: <200512290158.jBT1wEK28785@candle.pha.pa.us> + <1135862385.2964.804.camel@localhost.localdomain> +Date: Thu, 29 Dec 2005 09:35:27 -0500 +Message-ID: <1135866927.61038.13.camel@home> +X-Mailer: Evolution 2.4.2.1 FreeBSD GNOME Team Port +X-SA-Exim-Mail-From: pg@rbt.ca +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on psi.look.ca +X-Spam-Level: +X-Spam-Status: No, hits=0.7 required=9.0 tests=UPPERCASE_25_50 autolearn=no + version=2.63 +X-SA-Exim-Version: 3.1 (built Tue Feb 24 05:09:27 GMT 2004) +X-SA-Exim-Scanned: Yes +Content-Length: 393 + + +> So, my thinking would be to separate things into two: +> a) Add a TODO item "shared temp tables" that caters for (1) and (4) +> +> ALTER TABLE name RELIABILITY +> {DELETE ROWS AT RECOVERY | FULL RECOVERY} +> (syntax TBD) + +DELETE ROWS AT RECOVERY would need to be careful or disallowed when +referenced via a foreign key to ensure the database is not restored in +an inconsistent state. + +-- + +From pgsql-hackers-owner+M78022=pgman=candle.pha.pa.us@postgresql.org Thu Dec 29 10:10:57 2005 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +From: Simon Riggs +To: Rod Taylor +cc: Bruce Momjian , + Martijn van Oosterhout , Tom Lane , + Greg Stark , Qingqing Zhou , + pgsql-hackers@postgresql.org +In-Reply-To: <1135866927.61038.13.camel@home> +References: <200512290158.jBT1wEK28785@candle.pha.pa.us> + <1135862385.2964.804.camel@localhost.localdomain> + <1135866927.61038.13.camel@home> +Date: Thu, 29 Dec 2005 15:10:40 +0000 +Message-ID: <1135869040.2964.824.camel@localhost.localdomain> +X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.113 required=5 tests=[AWL=0.113] +X-Spam-Score: 0.113 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 888 + +On Thu, 2005-12-29 at 09:35 -0500, Rod Taylor wrote: +> > So, my thinking would be to separate things into two: +> > a) Add a TODO item "shared temp tables" that caters for (1) and (4) +> > +> > ALTER TABLE name RELIABILITY +> > {DELETE ROWS AT RECOVERY | FULL RECOVERY} +> > (syntax TBD) +> +> DELETE ROWS AT RECOVERY would need to be careful or disallowed when +> referenced via a foreign key to ensure the database is not restored in +> an inconsistent state. + +I think we'd need to apply the same rule as we do for temp tables: they +cannot be referenced by a permanent table. + +There are possibly some other restrictions also. Anyone? + +Best Regards, Simon Riggs + + +---------------------------(end of broadcast)--------------------------- +TIP 9: In versions below 8.0, the planner will ignore your desire to + choose an index scan if your joining column's datatypes do not + match + +From tgl@sss.pgh.pa.us Thu Dec 29 11:12:13 2005 +To: Simon Riggs +cc: Bruce Momjian , + Martijn van Oosterhout , Greg Stark , + Rod Taylor , Qingqing Zhou , + pgsql-hackers@postgresql.org +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +In-Reply-To: <1135862385.2964.804.camel@localhost.localdomain> +References: <200512290158.jBT1wEK28785@candle.pha.pa.us> <1135862385.2964.804.camel@localhost.localdomain> +Comments: In-reply-to Simon Riggs + message dated "Thu, 29 Dec 2005 13:19:45 +0000" +Date: Thu, 29 Dec 2005 11:12:11 -0500 +Message-ID: <7273.1135872731@sss.pgh.pa.us> +From: Tom Lane +Content-Length: 1963 + +Simon Riggs writes: +> My view would be that this thread has been complex because everybody has +> expressed a somewhat different requirement, which could be broken down +> as: +> 1. The need for a multi-user-accessible yet temporary table +> 2. Loading data into a table immediately after it is created (i.e. in +> same transaction), including but not limited to a reload from pg_dump +> 3. How to load data quickly into an existing table (COPY) +> 4. How to add/modify data quickly in an existing table (INSERT SELECT, +> UPDATE) + +> I can see the need for all of those individually; my existing patch +> submission covers (2) and (3) only. I very much like your thought to +> coalesce these various requirements into a single coherent model. + +However, you then seem to be arguing for still using the COPY LOCK +syntax, which I think Bruce intended would go away in favor of using +these ALTER commands. Certainly that's what I'd prefer --- COPY has +got too darn many options already. + +> In the patch, pg_dump has *not* been altered to use COPY LOCK, so a +> pg_dump *will* work with any other version of PostgreSQL, which *would +> not* be the case if we added ALTER TABLE ... RELIABILITY statements into +> it. + +Wrong --- the good thing about ALTER TABLE is that an old version of +Postgres would simply reject it and keep going. Therefore we could get +the speedup in dumps without losing compatibility, which is not true +of COPY LOCK. + +BTW, this is a perfect example of the use-case for not abandoning a +dump-file load simply because one command fails. (We have relied on +this sort of reasoning many times before, too, for example by using +"SET default_with_oids" in preference to CREATE TABLE WITH/WITHOUT OIDS.) +I don't think that "wrap the whole load into begin/end" is really a very +workable answer, because there are far too many scenarios where you +can't do that. Another one where it doesn't help is a data-only dump. + + regards, tom lane + +From pgsql-hackers-owner+M78028=pgman=candle.pha.pa.us@postgresql.org Thu Dec 29 11:12:41 2005 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +To: Simon Riggs +cc: Bruce Momjian , + Martijn van Oosterhout , Greg Stark , + Rod Taylor , Qingqing Zhou , + pgsql-hackers@postgresql.org +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +In-Reply-To: <1135862385.2964.804.camel@localhost.localdomain> +References: <200512290158.jBT1wEK28785@candle.pha.pa.us> <1135862385.2964.804.camel@localhost.localdomain> +Comments: In-reply-to Simon Riggs + message dated "Thu, 29 Dec 2005 13:19:45 +0000" +Date: Thu, 29 Dec 2005 11:12:11 -0500 +Message-ID: <7273.1135872731@sss.pgh.pa.us> +From: Tom Lane +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.053 required=5 tests=[AWL=0.053] +X-Spam-Score: 0.053 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 2075 + +Simon Riggs writes: +> My view would be that this thread has been complex because everybody has +> expressed a somewhat different requirement, which could be broken down +> as: +> 1. The need for a multi-user-accessible yet temporary table +> 2. Loading data into a table immediately after it is created (i.e. in +> same transaction), including but not limited to a reload from pg_dump +> 3. How to load data quickly into an existing table (COPY) +> 4. How to add/modify data quickly in an existing table (INSERT SELECT, +> UPDATE) + +> I can see the need for all of those individually; my existing patch +> submission covers (2) and (3) only. I very much like your thought to +> coalesce these various requirements into a single coherent model. + +However, you then seem to be arguing for still using the COPY LOCK +syntax, which I think Bruce intended would go away in favor of using +these ALTER commands. Certainly that's what I'd prefer --- COPY has +got too darn many options already. + +> In the patch, pg_dump has *not* been altered to use COPY LOCK, so a +> pg_dump *will* work with any other version of PostgreSQL, which *would +> not* be the case if we added ALTER TABLE ... RELIABILITY statements into +> it. + +Wrong --- the good thing about ALTER TABLE is that an old version of +Postgres would simply reject it and keep going. Therefore we could get +the speedup in dumps without losing compatibility, which is not true +of COPY LOCK. + +BTW, this is a perfect example of the use-case for not abandoning a +dump-file load simply because one command fails. (We have relied on +this sort of reasoning many times before, too, for example by using +"SET default_with_oids" in preference to CREATE TABLE WITH/WITHOUT OIDS.) +I don't think that "wrap the whole load into begin/end" is really a very +workable answer, because there are far too many scenarios where you +can't do that. Another one where it doesn't help is a data-only dump. + + regards, tom lane + +---------------------------(end of broadcast)--------------------------- +TIP 2: Don't 'kill -9' the postmaster + +From pgsql-hackers-owner+M78025=pgman=candle.pha.pa.us@postgresql.org Thu Dec 29 10:57:46 2005 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +Message-ID: <51082.68.143.134.146.1135872877.squirrel@www.dunslane.net> +Date: Thu, 29 Dec 2005 10:14:37 -0600 (CST) +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +From: Andrew Dunstan +To: +In-Reply-To: <200512290158.jBT1wEK28785@candle.pha.pa.us> +References: <200512290158.jBT1wEK28785@candle.pha.pa.us> +X-Priority: 3 +Importance: Normal +X-MSMail-Priority: Normal +cc: , , , + , , , + +X-Mailer: SquirrelMail (version 1.2.5) +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.082 required=5 tests=[AWL=0.082] +X-Spam-Score: 0.082 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 1185 + +Bruce Momjian said: +> DROP would drop the table on a restart +> after a non-clean shutdown. It would do _no_ logging on the table and +> allow concurrent access, plus index access. DELETE is the same as +> DROP, but it just truncates the table (perhaps TRUNCATE is a better +> word). +> +> EXCLUSIVE would allow only a single session to modify the table, and +> would do all changes by appending to the table, similar to COPY LOCK. +> EXCLUSIVE would also not allow indexes because those can not be +> isolated like appending to the heap. EXCLUSIVE would write all dirty +> shared buffers for the table and fsync them before committing. SHARE +> is the functionality we have now, with full logging. + + +I an horribly scared that this will be used as a "performance boost" for +normal use. I would at least like to see some restrictions that make it +harder to mis-use. Perhaps restrict to superuser? + +cheers + +andrew + + + + + +---------------------------(end of broadcast)--------------------------- +TIP 1: if posting/reading through Usenet, please send an appropriate + subscribe-nomail command to majordomo@postgresql.org so that your + message can get through to the mailing list cleanly + +From tgl@sss.pgh.pa.us Thu Dec 29 11:24:30 2005 +To: Bruce Momjian +cc: Andrew Dunstan , kleptog@svana.org, + simon@2ndquadrant.com, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, + pgsql-hackers@postgresql.org +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +In-Reply-To: <200512291605.jBTG5gi00396@candle.pha.pa.us> +References: <200512291605.jBTG5gi00396@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Thu, 29 Dec 2005 11:05:42 -0500" +Date: Thu, 29 Dec 2005 11:24:28 -0500 +Message-ID: <7966.1135873468@sss.pgh.pa.us> +From: Tom Lane +Content-Length: 612 + +Bruce Momjian writes: +> Andrew Dunstan wrote: +>> I an horribly scared that this will be used as a "performance boost" for +>> normal use. I would at least like to see some restrictions that make it +>> harder to mis-use. Perhaps restrict to superuser? + +> Certainly restrict to table owner. + +I can see the argument for superuser-only: decisions about data +integrity tradeoffs should be reserved to the DBA, who is the one who +will get blamed if the database loses data, no matter how stupid his +users are. + +But I'm not wedded to that. I could live with table-owner. + + regards, tom lane + +From pgsql-hackers-owner+M78031=pgman=candle.pha.pa.us@postgresql.org Thu Dec 29 11:38:17 2005 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +From: Bruce Momjian +Message-ID: <200512291637.jBTGbdC03848@candle.pha.pa.us> +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +In-Reply-To: <7273.1135872731@sss.pgh.pa.us> +To: Tom Lane +Date: Thu, 29 Dec 2005 11:37:39 -0500 (EST) +cc: Simon Riggs , + Martijn van Oosterhout , Greg Stark , + Rod Taylor , Qingqing Zhou , + pgsql-hackers@postgresql.org +X-Mailer: ELM [version 2.4ME+ PL121 (25)] +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.122 required=5 tests=[AWL=0.122] +X-Spam-Score: 0.122 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 3932 + +Tom Lane wrote: +> Simon Riggs writes: +> > My view would be that this thread has been complex because everybody has +> > expressed a somewhat different requirement, which could be broken down +> > as: +> > 1. The need for a multi-user-accessible yet temporary table +> > 2. Loading data into a table immediately after it is created (i.e. in +> > same transaction), including but not limited to a reload from pg_dump +> > 3. How to load data quickly into an existing table (COPY) +> > 4. How to add/modify data quickly in an existing table (INSERT SELECT, +> > UPDATE) +> +> > I can see the need for all of those individually; my existing patch +> > submission covers (2) and (3) only. I very much like your thought to +> > coalesce these various requirements into a single coherent model. +> +> However, you then seem to be arguing for still using the COPY LOCK +> syntax, which I think Bruce intended would go away in favor of using +> these ALTER commands. Certainly that's what I'd prefer --- COPY has +> got too darn many options already. +> +> > In the patch, pg_dump has *not* been altered to use COPY LOCK, so a +> > pg_dump *will* work with any other version of PostgreSQL, which *would +> > not* be the case if we added ALTER TABLE ... RELIABILITY statements into +> > it. +> +> Wrong --- the good thing about ALTER TABLE is that an old version of +> Postgres would simply reject it and keep going. Therefore we could get +> the speedup in dumps without losing compatibility, which is not true +> of COPY LOCK. +> +> BTW, this is a perfect example of the use-case for not abandoning a +> dump-file load simply because one command fails. (We have relied on +> this sort of reasoning many times before, too, for example by using +> "SET default_with_oids" in preference to CREATE TABLE WITH/WITHOUT OIDS.) +> I don't think that "wrap the whole load into begin/end" is really a very +> workable answer, because there are far too many scenarios where you +> can't do that. Another one where it doesn't help is a data-only dump. + +Yep, Tom is echoing my reaction. There is a temptation to add things up +onto existing commands, e.g. LOCK, and while it works, it makes for some +very complex user API's. Having COPY behave differently because it is +in a transaction is fine as long as it is user-invisible, but once you +require users to do that to get the speedup, it isn't user-invisible +anymore. + +(I can see it now, "Why is pg_dump putting things in transactions?", +"Because it prevents it from being logged." "Oh, should I be doing that +in my code?" "Perhaps, if you want ..." You can see where that +discussion is going. Having them see "ATER TABLE ... RELIBILITY +TRUNCATE" is very clear, and very clear on how it can be used in user +code.) + +I think there is great utility in giving users one API, namely +RELIABILITY (or some other keyword), and telling them that is where they +control logging. I realize adding one keyword, LOCK, to an existing +command isn't a big deal, but once you decentralize your API enough +times, you end up with a terribly complex database system. It is this +design rigidity that helps make PostgreSQL so much easier to use than +other database systems. + +I do think it is valid concern about someone use the table between the +CREATE and the ALTER TABLE RELIABILITY. One solution would be to allow +the RELIABILITY as part of the CREATE TABLE, another is to tell users to +create the table inside a transaction. + +-- + Bruce Momjian | http://candle.pha.pa.us + pgman@candle.pha.pa.us | (610) 359-1001 + + If your life is a hard drive, | 13 Roberts Road + + Christ can be your backup. | Newtown Square, Pennsylvania 19073 + +---------------------------(end of broadcast)--------------------------- +TIP 9: In versions below 8.0, the planner will ignore your desire to + choose an index scan if your joining column's datatypes do not + match + +From pgsql-hackers-owner+M78036=pgman=candle.pha.pa.us@postgresql.org Thu Dec 29 12:21:12 2005 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +To: Andrew Dunstan +cc: , , , + , , , + , +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +References: <200512290158.jBT1wEK28785@candle.pha.pa.us> + <51082.68.143.134.146.1135872877.squirrel@www.dunslane.net> +In-Reply-To: <51082.68.143.134.146.1135872877.squirrel@www.dunslane.net> +From: Greg Stark +Organization: The Emacs Conspiracy; member since 1992 +Date: 29 Dec 2005 12:20:32 -0500 +Message-ID: <87vex74y73.fsf@stark.xeocode.com> +Lines: 42 +User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.4 +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.112 required=5 tests=[AWL=0.112] +X-Spam-Score: 0.112 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 1983 + +"Andrew Dunstan" writes: + +> Bruce Momjian said: +> > DROP would drop the table on a restart +> > after a non-clean shutdown. It would do _no_ logging on the table and +> > allow concurrent access, plus index access. DELETE is the same as +> > DROP, but it just truncates the table (perhaps TRUNCATE is a better +> > word). +> > +> > EXCLUSIVE would allow only a single session to modify the table, and +> > would do all changes by appending to the table, similar to COPY LOCK. +> > EXCLUSIVE would also not allow indexes because those can not be +> > isolated like appending to the heap. EXCLUSIVE would write all dirty +> > shared buffers for the table and fsync them before committing. SHARE +> > is the functionality we have now, with full logging. +> +> I an horribly scared that this will be used as a "performance boost" for +> normal use. I would at least like to see some restrictions that make it +> harder to mis-use. Perhaps restrict to superuser? + +Well that's its whole purpose. At least you can hardly argue that you didn't +realize the consequences of "DELETE ROWS ON RECOVERY"... :) + +Some thoughts: + +a) I'm not sure I understand the purpose of EXCLUSIVE. When would I ever want to + use it instead of DELETE ROWS? + +b) It seems like the other feature people were talking about of not logging + for a table created within the same transaction should be handled by + having this flag implicitly set for any such newly created table. + Ie, the test for whether to log would look like: + + if (!table->logged && table->xid != myxid) ... + +c) Every option in ALTER TABLE should be in CREATE TABLE as well. + +d) Yes as someone else mentioned, this should only be allowable on a table + with no foreign keys referencing it. + +-- +greg + + +---------------------------(end of broadcast)--------------------------- +TIP 9: In versions below 8.0, the planner will ignore your desire to + choose an index scan if your joining column's datatypes do not + match + +From pgsql-hackers-owner+M78037=pgman=candle.pha.pa.us@postgresql.org Thu Dec 29 12:31:40 2005 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +From: Bruce Momjian +Message-ID: <200512291730.jBTHUnn09840@candle.pha.pa.us> +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +In-Reply-To: <87vex74y73.fsf@stark.xeocode.com> +To: Greg Stark +Date: Thu, 29 Dec 2005 12:30:49 -0500 (EST) +cc: Andrew Dunstan , kleptog@svana.org, + simon@2ndquadrant.com, tgl@sss.pgh.pa.us, pg@rbt.ca, zhouqq@cs.toronto.edu, + pgsql-hackers@postgresql.org +X-Mailer: ELM [version 2.4ME+ PL121 (25)] +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.122 required=5 tests=[AWL=0.122] +X-Spam-Score: 0.122 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 3304 + +Greg Stark wrote: +> "Andrew Dunstan" writes: +> +> > Bruce Momjian said: +> > > DROP would drop the table on a restart +> > > after a non-clean shutdown. It would do _no_ logging on the table and +> > > allow concurrent access, plus index access. DELETE is the same as +> > > DROP, but it just truncates the table (perhaps TRUNCATE is a better +> > > word). +> > > +> > > EXCLUSIVE would allow only a single session to modify the table, and +> > > would do all changes by appending to the table, similar to COPY LOCK. +> > > EXCLUSIVE would also not allow indexes because those can not be +> > > isolated like appending to the heap. EXCLUSIVE would write all dirty +> > > shared buffers for the table and fsync them before committing. SHARE +> > > is the functionality we have now, with full logging. +> > +> > I an horribly scared that this will be used as a "performance boost" for +> > normal use. I would at least like to see some restrictions that make it +> > harder to mis-use. Perhaps restrict to superuser? +> +> Well that's its whole purpose. At least you can hardly argue that you didn't +> realize the consequences of "DELETE ROWS ON RECOVERY"... :) + +True. I think we are worried about non-owners using it, but the owner +had to grant permissions for others to modify it, so we might be OK. + +> Some thoughts: +> +> a) I'm not sure I understand the purpose of EXCLUSIVE. When would I ever want to +> use it instead of DELETE ROWS? + +Good question. The use case is doing COPY into a table that already had +data. EXCLUSIVE allows additions to the table but preserves the +existing data on a crash. + +> b) It seems like the other feature people were talking about of not logging +> for a table created within the same transaction should be handled by +> having this flag implicitly set for any such newly created table. +> Ie, the test for whether to log would look like: +> +> if (!table->logged && table->xid != myxid) ... + +Yes, the question is whether we want to limit users to having this +optimization _only_ when they have created the table in the same +transaction, and the short answer is we don't. + +> c) Every option in ALTER TABLE should be in CREATE TABLE as well. + +I looked into that and see that things like: + + ALTER [ COLUMN ] column SET STATISTICS integer + ALTER [ COLUMN ] column SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN } + +are not supported by CREATE TABLE, and probably shouldn't be because the +value can be changed after the table is created. I think the only +things we usually support in CREATE TABLE are those that cannot be +altered. + +> d) Yes as someone else mentioned, this should only be allowable on a table +> with no foreign keys referencing it. + +Right, and EXCLUSIVE can not have an index either. + +-- + Bruce Momjian | http://candle.pha.pa.us + pgman@candle.pha.pa.us | (610) 359-1001 + + If your life is a hard drive, | 13 Roberts Road + + Christ can be your backup. | Newtown Square, Pennsylvania 19073 + +---------------------------(end of broadcast)--------------------------- +TIP 1: if posting/reading through Usenet, please send an appropriate + subscribe-nomail command to majordomo@postgresql.org so that your + message can get through to the mailing list cleanly + +From simon@2ndquadrant.com Fri Dec 30 08:10:53 2005 +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +From: Simon Riggs +To: Bruce Momjian +cc: Andrew Dunstan , Tom Lane , + Martijn van Oosterhout , Greg Stark , + Rod Taylor , Qingqing Zhou , + pgsql-hackers@postgresql.org +In-Reply-To: <200512291637.jBTGbdC03848@candle.pha.pa.us> +References: <200512291637.jBTGbdC03848@candle.pha.pa.us> +Date: Fri, 30 Dec 2005 13:09:12 +0000 +Message-ID: <1135948152.2862.113.camel@localhost.localdomain> +X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) +Content-Length: 6343 + +On Thu, 2005-12-29 at 11:37 -0500, Bruce Momjian wrote: +> Tom Lane wrote: +> > Simon Riggs writes: +> > > My view would be that this thread has been complex because everybody has +> > > expressed a somewhat different requirement, which could be broken down +> > > as: +> > > 1. The need for a multi-user-accessible yet temporary table +> > > 2. Loading data into a table immediately after it is created (i.e. in +> > > same transaction), including but not limited to a reload from pg_dump +> > > 3. How to load data quickly into an existing table (COPY) +> > > 4. How to add/modify data quickly in an existing table (INSERT SELECT, +> > > UPDATE) + +> > However, you then seem to be arguing for still using the COPY LOCK +> > syntax, which I think Bruce intended would go away in favor of using +> > these ALTER commands. Certainly that's what I'd prefer --- COPY has +> > got too darn many options already. + +COPY LOCK was Tom's suggestion at the end of a long discussion thread on +this precise issue. Nobody objected to it at that point; I implemented +it *exactly* that way because I wanted to very visibly follow the +consensus of the community, after informed debate. +http://archives.postgresql.org/pgsql-hackers/2005-06/msg00068.php + +Please re-read the links to previous discussions. +http://archives.postgresql.org/pgsql-hackers/2005-06/msg00069.php +There are points there, not made by me, that still apply and need to be +considered here, yet have not been. + +Just to restate my current thinking: +- agree we should have ALTER TABLE ... RELIABILITY DELETE ROWS +- we should have COPY LOCK rather than +ALTER TABLE .... RELIABILITY EXCLUSIVE +(Though I welcome better wording and syntax in either case; it is the +behaviour only that I discuss). + +It seems now that we have agreed approaches for (1), (2) and (4). Please +note that I have listened to the needs of others with regard to +requirement (1), as espoused by earlier by Hannu and again now by +Martijn. Some of the points about requirement (3) I made in my previous +post have not yet been addressed, IMHO. + +My mind is not fixed. AFAICS there are valid points remaining on both +sides of the discussion about loading data quickly into an existing +table. + +> I do think it is valid concern about someone use the table between the +> CREATE and the ALTER TABLE RELIABILITY. One solution would be to allow +> the RELIABILITY as part of the CREATE TABLE, another is to tell users to +> create the table inside a transaction. + +Neither solution works for this use case: + +> > 3. How to load data quickly into an existing table (COPY) + +This is the only use case for which ALTER TABLE ... EXCLUSIVE makes +sense. That option means that any write lock held upon the table would +be an EXCLUSIVE table lock, so would never be a performance gain with +single row INSERT, UPDATE or DELETEs. + +Following Andrew's concerns, I'd also note that ALTER TABLE requires a +much higher level of privilege to operate than does COPY. That sounds +like it will make things more secure, but all it does is open up the +administrative rights, since full ownership rights must be obtained +merely to load data. + +> Having COPY behave differently because it is +> in a transaction is fine as long as it is user-invisible + +Good + +> I think there is great utility in giving users one API, namely +> RELIABILITY (or some other keyword), and telling them that is where they +> control logging. I realize adding one keyword, LOCK, to an existing +> command isn't a big deal, but once you decentralize your API enough +> times, you end up with a terribly complex database system. It is this +> design rigidity that helps make PostgreSQL so much easier to use than +> other database systems. + +I do see the appeal of your suggestion... + +TRUNCATE is a special command to delete quickly. There is no requirement +to do an ALTER TABLE statement before that command executes. + +Balance would suggest that a special command to load data quickly would +be reasonably accepted by users. + + + + +Minor points below: + +> > > In the patch, pg_dump has *not* been altered to use COPY LOCK, so a +> > > pg_dump *will* work with any other version of PostgreSQL, which *would +> > > not* be the case if we added ALTER TABLE ... RELIABILITY statements into +> > > it. +> > +> > Wrong --- the good thing about ALTER TABLE is that an old version of +> > Postgres would simply reject it and keep going. Therefore we could get +> > the speedup in dumps without losing compatibility, which is not true +> > of COPY LOCK. + +That was pointing out one of Bruce's objections was not relevant because +it assumed COPY LOCK was required to make pg_restore go faster; that was +not the case - so there is no valid objection either way now. + +> > BTW, this is a perfect example of the use-case for not abandoning a +> > dump-file load simply because one command fails. (We have relied on +> > this sort of reasoning many times before, too, for example by using +> > "SET default_with_oids" in preference to CREATE TABLE WITH/WITHOUT OIDS.) +> > I don't think that "wrap the whole load into begin/end" is really a very +> > workable answer, because there are far too many scenarios where you +> > can't do that. Another one where it doesn't help is a data-only dump. + +Which is why --single-transaction is not the default, per the earlier +discussion on that point (on -patches). + +> Yep, Tom is echoing my reaction. There is a temptation to add things up +> onto existing commands, e.g. LOCK, and while it works, it makes for some +> very complex user API's. Having COPY behave differently because it is +> in a transaction is fine as long as it is user-invisible, but once you +> require users to do that to get the speedup, it isn't user-invisible +> anymore. +> +> (I can see it now, "Why is pg_dump putting things in transactions?", +> "Because it prevents it from being logged." "Oh, should I be doing that +> in my code?" "Perhaps, if you want ..." You can see where that +> discussion is going. Having them see "ATER TABLE ... RELIBILITY +> TRUNCATE" is very clear, and very clear on how it can be used in user +> code.) + +The above case is not an argument against COPY LOCK. Exactly what you +say above would still occur even when we have ALTER TABLE ... +RELIABILITY statement, since COPY LOCK and +COPY-optimized-within-same-transaction are different things. + +Best Regards, Simon Riggs + +From pgsql-hackers-owner+M78064=pgman=candle.pha.pa.us@postgresql.org Fri Dec 30 11:50:49 2005 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +From: Bruce Momjian +Message-ID: <200512301649.jBUGnxn21488@candle.pha.pa.us> +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +In-Reply-To: <1135948152.2862.113.camel@localhost.localdomain> +To: Simon Riggs +Date: Fri, 30 Dec 2005 11:49:59 -0500 (EST) +cc: Andrew Dunstan , Tom Lane , + Martijn van Oosterhout , Greg Stark , + Rod Taylor , Qingqing Zhou , + pgsql-hackers@postgresql.org +X-Mailer: ELM [version 2.4ME+ PL121 (25)] +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.12 required=5 tests=[AWL=0.120] +X-Spam-Score: 0.12 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 8888 + +Simon Riggs wrote: +> On Thu, 2005-12-29 at 11:37 -0500, Bruce Momjian wrote: +> > Tom Lane wrote: +> > > Simon Riggs writes: +> > > > My view would be that this thread has been complex because everybody has +> > > > expressed a somewhat different requirement, which could be broken down +> > > > as: +> > > > 1. The need for a multi-user-accessible yet temporary table +> > > > 2. Loading data into a table immediately after it is created (i.e. in +> > > > same transaction), including but not limited to a reload from pg_dump +> > > > 3. How to load data quickly into an existing table (COPY) +> > > > 4. How to add/modify data quickly in an existing table (INSERT SELECT, +> > > > UPDATE) +> +> > > However, you then seem to be arguing for still using the COPY LOCK +> > > syntax, which I think Bruce intended would go away in favor of using +> > > these ALTER commands. Certainly that's what I'd prefer --- COPY has +> > > got too darn many options already. +> +> COPY LOCK was Tom's suggestion at the end of a long discussion thread on +> this precise issue. Nobody objected to it at that point; I implemented +> it *exactly* that way because I wanted to very visibly follow the +> consensus of the community, after informed debate. +> http://archives.postgresql.org/pgsql-hackers/2005-06/msg00068.php +> +> Please re-read the links to previous discussions. +> http://archives.postgresql.org/pgsql-hackers/2005-06/msg00069.php +> There are points there, not made by me, that still apply and need to be +> considered here, yet have not been. + +Yes, I know we agreed to the COPY LOCK, but new features now being +requested, so we have to re-evaluate where we are going with COPY LOCK +to get a more consistent solution. + +> Just to restate my current thinking: +> - agree we should have ALTER TABLE ... RELIABILITY DELETE ROWS +> - we should have COPY LOCK rather than +> ALTER TABLE .... RELIABILITY EXCLUSIVE +> (Though I welcome better wording and syntax in either case; it is the +> behaviour only that I discuss). +> +> It seems now that we have agreed approaches for (1), (2) and (4). Please +> note that I have listened to the needs of others with regard to +> requirement (1), as espoused by earlier by Hannu and again now by +> Martijn. Some of the points about requirement (3) I made in my previous +> post have not yet been addressed, IMHO. +> +> My mind is not fixed. AFAICS there are valid points remaining on both +> sides of the discussion about loading data quickly into an existing +> table. +> +> > I do think it is valid concern about someone use the table between the +> > CREATE and the ALTER TABLE RELIABILITY. One solution would be to allow +> > the RELIABILITY as part of the CREATE TABLE, another is to tell users to +> > create the table inside a transaction. +> +> Neither solution works for this use case: +> +> > > 3. How to load data quickly into an existing table (COPY) +> +> This is the only use case for which ALTER TABLE ... EXCLUSIVE makes +> sense. That option means that any write lock held upon the table would +> be an EXCLUSIVE table lock, so would never be a performance gain with +> single row INSERT, UPDATE or DELETEs. + +Ah, but people wanted fast INSERT INTO ... SELECT, and that would use +EXCLUSIVE too. What about a massive UPDATE? Perhaps that could use +EXCLUSIVE? We don't want to add "LOCK" to every command that might use +EXCLUSIVE. ALTER is much better for this. + +I agree if we thought EXCLUSIVE would only be used for COPY, we could +use LOCK, but I am thinking it will be used for other commands as well. + +> Following Andrew's concerns, I'd also note that ALTER TABLE requires a +> much higher level of privilege to operate than does COPY. That sounds +> like it will make things more secure, but all it does is open up the +> administrative rights, since full ownership rights must be obtained +> merely to load data. + +True, but as pointed out by others, I don't see that happening too +often. + +> > Having COPY behave differently because it is +> > in a transaction is fine as long as it is user-invisible +> +> Good +> +> > I think there is great utility in giving users one API, namely +> > RELIABILITY (or some other keyword), and telling them that is where they +> > control logging. I realize adding one keyword, LOCK, to an existing +> > command isn't a big deal, but once you decentralize your API enough +> > times, you end up with a terribly complex database system. It is this +> > design rigidity that helps make PostgreSQL so much easier to use than +> > other database systems. +> +> I do see the appeal of your suggestion... +> +> TRUNCATE is a special command to delete quickly. There is no requirement +> to do an ALTER TABLE statement before that command executes. + +The TRUNCATE happens during recovery. There is no user interaction. It +happens because we can't restore the contents of the table in a +consistent state because no logging was used. Basically, a table marked +RELIABILITY TRUNCATE would be truncated on a recovery start of the +postmaster. + +> Balance would suggest that a special command to load data quickly would +> be reasonably accepted by users. +> +> +> +> +> Minor points below: +> +> > > > In the patch, pg_dump has *not* been altered to use COPY LOCK, so a +> > > > pg_dump *will* work with any other version of PostgreSQL, which *would +> > > > not* be the case if we added ALTER TABLE ... RELIABILITY statements into +> > > > it. +> > > +> > > Wrong --- the good thing about ALTER TABLE is that an old version of +> > > Postgres would simply reject it and keep going. Therefore we could get +> > > the speedup in dumps without losing compatibility, which is not true +> > > of COPY LOCK. +> +> That was pointing out one of Bruce's objections was not relevant because +> it assumed COPY LOCK was required to make pg_restore go faster; that was +> not the case - so there is no valid objection either way now. + +I don't consider the single-transaction to be a no-cost solution. You +are adding flags to commands, and you are using a dump layout for +performance where the purpose for the layout is not clear. The ALTER is +clear to the user, and it allows nologging operations to happen after +the table is created. + +In fact, for use in pg_dump, I think DROP is the proper operation for +loading, not your transaction wrapping solution. We already agree we +need DROP (or TRUNCATE), so why not use that rather than the transaction +wrap idea? + +> > > BTW, this is a perfect example of the use-case for not abandoning a +> > > dump-file load simply because one command fails. (We have relied on +> > > this sort of reasoning many times before, too, for example by using +> > > "SET default_with_oids" in preference to CREATE TABLE WITH/WITHOUT OIDS.) +> > > I don't think that "wrap the whole load into begin/end" is really a very +> > > workable answer, because there are far too many scenarios where you +> > > can't do that. Another one where it doesn't help is a data-only dump. +> +> Which is why --single-transaction is not the default, per the earlier +> discussion on that point (on -patches). + +Right, but why not use DROP/TRUNCATE? That works for old dumps too, and +has no downsides, meaning it can be always on. + +> > Yep, Tom is echoing my reaction. There is a temptation to add things up +> > onto existing commands, e.g. LOCK, and while it works, it makes for some +> > very complex user API's. Having COPY behave differently because it is +> > in a transaction is fine as long as it is user-invisible, but once you +> > require users to do that to get the speedup, it isn't user-invisible +> > anymore. +> > +> > (I can see it now, "Why is pg_dump putting things in transactions?", +> > "Because it prevents it from being logged." "Oh, should I be doing that +> > in my code?" "Perhaps, if you want ..." You can see where that +> > discussion is going. Having them see "ATER TABLE ... RELIBILITY +> > TRUNCATE" is very clear, and very clear on how it can be used in user +> > code.) +> +> The above case is not an argument against COPY LOCK. Exactly what you +> say above would still occur even when we have ALTER TABLE ... +> RELIABILITY statement, since COPY LOCK and +> COPY-optimized-within-same-transaction are different things. + +See my posting above that we might want EXCLUSIVE for other commands, +meaning ALTER makes more sense. + +So, to summarize, I think we should add DROP/TRUNCATE, and use that by +default (or optionally off?) in pg_dump, and, assuming we want EXCLUSIVE +for more than just COPY, we need to add ALTER TABLE EXCLUSIVE. + +-- + Bruce Momjian | http://candle.pha.pa.us + pgman@candle.pha.pa.us | (610) 359-1001 + + If your life is a hard drive, | 13 Roberts Road + + Christ can be your backup. | Newtown Square, Pennsylvania 19073 + +---------------------------(end of broadcast)--------------------------- +TIP 5: don't forget to increase your free space map settings + +From pgsql-hackers-owner+M78065=pgman=candle.pha.pa.us@postgresql.org Fri Dec 30 12:40:48 2005 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +Message-ID: <43B570C9.6060406@dunslane.net> +Date: Fri, 30 Dec 2005 12:39:21 -0500 +From: Andrew Dunstan +User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.7.12-1.3.1 +X-Accept-Language: en-us, en +To: Tom Lane +cc: simon@2ndquadrant.com, pgman@candle.pha.pa.us, kleptog@svana.org, + gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, + pgsql-hackers@postgresql.org +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +References: <1135948152.2862.113.camel@localhost.localdomain> <56737.68.143.134.146.1135954413.squirrel@www.dunslane.net> <11876.1135954626@sss.pgh.pa.us> +In-Reply-To: <11876.1135954626@sss.pgh.pa.us> +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.041 required=5 tests=[AWL=0.041] +X-Spam-Score: 0.041 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 1815 + + + +Tom Lane wrote: + +>"Andrew Dunstan" writes: +> +> +>>Simon Riggs said: +>> +>> +>>>Following Andrew's concerns, I'd also note that ALTER TABLE requires a +>>>much higher level of privilege to operate than does COPY. That sounds +>>>like it will make things more secure, but all it does is open up the +>>>administrative rights, since full ownership rights must be obtained +>>>merely to load data. +>>> +>>> +> +> +> +>>My concern is more about making plain that this is for special operations, +>>not normal operations. Or maybe I have misunderstood the purpose. +>> +>> +> +>Rephrase that as "full ownership rights must be obtained to load data in +>a way that requires dropping any existing indexes and locking out other +>users of the table". I don't think the use-case for this will be very +>large for non-owners, or indeed even for owners except during initial +>table creation; and so I don't think the above argument is strong. +> +> +> +> + +Those restrictions aren't true of Bruce's proposed drop and +delete/truncate recovery modes, are they? + +People do crazy things in pursuit of performance. Illustration: a few +months ago I was instrumenting an app (based on MySQL/ISAM) and I +noticed that under load it simply didn't update the inventory properly - +of 1000 orders placed within a few seconds it might reduce inventory by +3 or 4. I reported this and they shrugged their shoulders and said +"well, we'd have to lock the table and that would slow everything down +...". + +I just want to be sure we aren't providing a footgun. "Oh, just set +recovery mode to delete. It won't make any difference unless you crash +and you'll run faster." + +cheers + +andrew + + + + +---------------------------(end of broadcast)--------------------------- +TIP 5: don't forget to increase your free space map settings + +From pgsql-hackers-owner+M78066=pgman=candle.pha.pa.us@postgresql.org Fri Dec 30 12:58:52 2005 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +From: Bruce Momjian +Message-ID: <200512301758.jBUHwFv03107@candle.pha.pa.us> +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +In-Reply-To: <43B570C9.6060406@dunslane.net> +To: Andrew Dunstan +Date: Fri, 30 Dec 2005 12:58:15 -0500 (EST) +cc: Tom Lane , simon@2ndquadrant.com, kleptog@svana.org, + gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, + pgsql-hackers@postgresql.org +X-Mailer: ELM [version 2.4ME+ PL121 (25)] +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.12 required=5 tests=[AWL=0.120] +X-Spam-Score: 0.12 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 1996 + +Andrew Dunstan wrote: +> >>My concern is more about making plain that this is for special operations, +> >>not normal operations. Or maybe I have misunderstood the purpose. +> >> +> >> +> > +> >Rephrase that as "full ownership rights must be obtained to load data in +> >a way that requires dropping any existing indexes and locking out other +> >users of the table". I don't think the use-case for this will be very +> >large for non-owners, or indeed even for owners except during initial +> >table creation; and so I don't think the above argument is strong. +> > +> > +> > +> > +> +> Those restrictions aren't true of Bruce's proposed drop and +> delete/truncate recovery modes, are they? + +Only the owner could do the ALTER, for sure, but once the owner sets it, +any user with permission to write to the table would have those +characteristics. + +> People do crazy things in pursuit of performance. Illustration: a few +> months ago I was instrumenting an app (based on MySQL/ISAM) and I +> noticed that under load it simply didn't update the inventory properly - +> of 1000 orders placed within a few seconds it might reduce inventory by +> 3 or 4. I reported this and they shrugged their shoulders and said +> "well, we'd have to lock the table and that would slow everything down +> ...". +> +> I just want to be sure we aren't providing a footgun. "Oh, just set +> recovery mode to delete. It won't make any difference unless you crash +> and you'll run faster." + +I think we have to trust the object owner in this case. I don't know of +any super-user-only ALTER commands, but I suppose we could set it up +that way if we wanted. + +-- + Bruce Momjian | http://candle.pha.pa.us + pgman@candle.pha.pa.us | (610) 359-1001 + + If your life is a hard drive, | 13 Roberts Road + + Christ can be your backup. | Newtown Square, Pennsylvania 19073 + +---------------------------(end of broadcast)--------------------------- +TIP 2: Don't 'kill -9' the postmaster + +From pgsql-hackers-owner+M78070=pgman=candle.pha.pa.us@postgresql.org Fri Dec 30 14:29:06 2005 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +From: Simon Riggs +To: Bruce Momjian +cc: Andrew Dunstan , Tom Lane , + Martijn van Oosterhout , Greg Stark , + Rod Taylor , Qingqing Zhou , + pgsql-hackers@postgresql.org +In-Reply-To: <200512301649.jBUGnxn21488@candle.pha.pa.us> +References: <200512301649.jBUGnxn21488@candle.pha.pa.us> +Date: Fri, 30 Dec 2005 19:28:41 +0000 +Message-ID: <1135970921.5052.68.camel@localhost.localdomain> +X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.034 required=5 tests=[AWL=0.034] +X-Spam-Score: 0.034 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 3112 + +On Fri, 2005-12-30 at 11:49 -0500, Bruce Momjian wrote: + +> Yes, I know we agreed to the COPY LOCK, but new features now being +> requested, so we have to re-evaluate where we are going with COPY LOCK +> to get a more consistent solution. + +Thank you. + +> Ah, but people wanted fast INSERT INTO ... SELECT, and that would use +> EXCLUSIVE too. What about a massive UPDATE? Perhaps that could use +> EXCLUSIVE? We don't want to add "LOCK" to every command that might use +> EXCLUSIVE. ALTER is much better for this. + +> I agree if we thought EXCLUSIVE would only be used for COPY, we could +> use LOCK, but I am thinking it will be used for other commands as well. + +Agreed, I will look to implement this. + +Could the internals of my recent patch be reviewed? Changing the user +interface is less of a problem than changing the internals, which is +where the hard work takes place. I do not want to extend this work +further only to have that part rejected later. + +The implications of EXCLUSIVE are: +- there will be a check on each and every I, U, D to check the state of +the relation +- *every* operation that attempts a write lock will attempt to acquire +an EXCLUSIVE full table lock instead +- following successful completion of *each* DML statement, the relation +will be heap_sync'd involving a full scan of the buffer cache + +Can I clarify the wording of the syntax? Is EXCLUSIVE the right word? +How about FASTLOAD or BULKLOAD? Those words seem less likely to be +misused in the future - i.e. we are invoking a special mode, rather than +invoking a special "go faster" option. + +> I don't consider the single-transaction to be a no-cost solution. You +> are adding flags to commands, and you are using a dump layout for +> performance where the purpose for the layout is not clear. The ALTER is +> clear to the user, and it allows nologging operations to happen after +> the table is created. +> +> In fact, for use in pg_dump, I think DROP is the proper operation for +> loading, not your transaction wrapping solution. We already agree we +> need DROP (or TRUNCATE), so why not use that rather than the transaction +> wrap idea? + +This was discussed on-list by 2 core team members, a committer and +myself, but I see no requirements change here. You even accepted the +invisible COPY optimization in your last post - why unpick that now? +Please forgive my tone, but I am lost for reasonable yet expressive +words. + +The --single-transaction mode would apply even if the dump was created +using an earlier version of pg_dump. pg_dump has *not* been altered at +all. (And I would again add that the idea was not my own) + +> So, to summarize, I think we should add DROP/TRUNCATE, and use that by +> default (or optionally off?) in pg_dump, and, assuming we want EXCLUSIVE +> for more than just COPY, we need to add ALTER TABLE EXCLUSIVE. + +Would you mind stating again what you mean, just so I can understand +this? Your summary isn't enough. + +Best Regards, Simon Riggs + + +---------------------------(end of broadcast)--------------------------- +TIP 3: Have you checked our extensive FAQ? + + http://www.postgresql.org/docs/faq + +From pgsql-hackers-owner+M78072=pgman=candle.pha.pa.us@postgresql.org Fri Dec 30 16:15:30 2005 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +From: Bruce Momjian +Message-ID: <200512302114.jBULEno02301@candle.pha.pa.us> +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +In-Reply-To: <1135970921.5052.68.camel@localhost.localdomain> +To: Simon Riggs +Date: Fri, 30 Dec 2005 16:14:49 -0500 (EST) +cc: Andrew Dunstan , Tom Lane , + Martijn van Oosterhout , Greg Stark , + Rod Taylor , Qingqing Zhou , + pgsql-hackers@postgresql.org +X-Mailer: ELM [version 2.4ME+ PL121 (25)] +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.12 required=5 tests=[AWL=0.120] +X-Spam-Score: 0.12 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 7285 + +Simon Riggs wrote: +> On Fri, 2005-12-30 at 11:49 -0500, Bruce Momjian wrote: +> +> > Yes, I know we agreed to the COPY LOCK, but new features now being +> > requested, so we have to re-evaluate where we are going with COPY LOCK +> > to get a more consistent solution. +> +> Thank you. + +Good. I think we can be happy that COPY LOCK didn't get into a release, +so we don't have to support it forever. When we are adding features, we +have to consider not only the current release, but future releases and +what people will ask for in the future so the syntax can be expanded +without breaking previous usage. + +> > Ah, but people wanted fast INSERT INTO ... SELECT, and that would use +> > EXCLUSIVE too. What about a massive UPDATE? Perhaps that could use +> > EXCLUSIVE? We don't want to add "LOCK" to every command that might use +> > EXCLUSIVE. ALTER is much better for this. +> +> > I agree if we thought EXCLUSIVE would only be used for COPY, we could +> > use LOCK, but I am thinking it will be used for other commands as well. +> +> Agreed, I will look to implement this. +> +> Could the internals of my recent patch be reviewed? Changing the user +> interface is less of a problem than changing the internals, which is +> where the hard work takes place. I do not want to extend this work +> further only to have that part rejected later. + +OK, I will look it over this week or next. + +> The implications of EXCLUSIVE are: +> - there will be a check on each and every I, U, D to check the state of +> the relation +> - *every* operation that attempts a write lock will attempt to acquire +> an EXCLUSIVE full table lock instead +> - following successful completion of *each* DML statement, the relation +> will be heap_sync'd involving a full scan of the buffer cache + +Yes, I think that is it. What we can do is implement EXCLUSIVE to +affect only COPY at this point, and document that, and later add other +commands. + +> Can I clarify the wording of the syntax? Is EXCLUSIVE the right word? +> How about FASTLOAD or BULKLOAD? Those words seem less likely to be +> misused in the future - i.e. we are invoking a special mode, rather than +> invoking a special "go faster" option. + +The problem with the FASTLOAD/BULKLOAD words is that EXCLUSIVE mode is +probably not the best for loading. I would think TRUNCATE would be a +better option. + +In fact, in loading a table, I think both EXCLUSIVE and TRUNCATE would be +the same, mostly. You would create the table, set its RELIABILITY to +TRUNCATE, COPY into the table, then set the RELIABILITY to SHARE or +DEFAULT. The second ALTER has to sync all the dirty data blocks, which +the same thing EXCLUSIVE does at the conclusion of COPY. + +So, we need a name for EXCLUSIVE mode that suggests how it is different +from TRUNCATE, and in this case, the difference is that EXCLUSIVE +preserves the previous contents of the table on recovery, while TRUNCATE +does not. Do you want to call the mode PRESERVE, or EXCLUSIVE WRITER? +Anyway, the keywords are easy to modify, even after the patch is +submitted. FYI, I usually go through keywords.c looking for a keyword +we already use. + +> > I don't consider the single-transaction to be a no-cost solution. You +> > are adding flags to commands, and you are using a dump layout for +> > performance where the purpose for the layout is not clear. The ALTER is +> > clear to the user, and it allows nologging operations to happen after +> > the table is created. +> > +> > In fact, for use in pg_dump, I think DROP is the proper operation for +> > loading, not your transaction wrapping solution. We already agree we +> > need DROP (or TRUNCATE), so why not use that rather than the transaction +> > wrap idea? +> +> This was discussed on-list by 2 core team members, a committer and +> myself, but I see no requirements change here. You even accepted the +> invisible COPY optimization in your last post - why unpick that now? +> Please forgive my tone, but I am lost for reasonable yet expressive +> words. + +Do you think you are the only one who has rewritten a patch multiple +times? We all have. The goal is to get the functionality into the +system in the most seamless way possible. Considering the number of +people who use PostgreSQL, if it takes use 10 tries, it is worth it +considering the thousands of people who will use it. Would you have us +include a sub-optimal patch and have thousands of people adjust to its +non-optimal functionality? I am sure you would not. Perhaps a company +would say, "Oh, just ship it", but we don't. + +> The --single-transaction mode would apply even if the dump was created +> using an earlier version of pg_dump. pg_dump has *not* been altered at +> all. (And I would again add that the idea was not my own) + +I assume you mean this: + + http://archives.postgresql.org/pgsql-patches/2005-12/msg00257.php + +I guess with the ALTER commands I don't see much value in the +--single-transaction flag. I am sure others suggested it, but would +they suggest it now given our current direction. The fact that the +patch was submitted does not give it any more weight --- the question is +does this feature make sense for 8.2. The goal is not to cram as many +optimizations into PostgreSQL as possible, the goal is to present a +consistent usable system to users. + +> > So, to summarize, I think we should add DROP/TRUNCATE, and use that by +> > default (or optionally off?) in pg_dump, and, assuming we want EXCLUSIVE +> > for more than just COPY, we need to add ALTER TABLE EXCLUSIVE. +> +> Would you mind stating again what you mean, just so I can understand +> this? Your summary isn't enough. + +New ALTER TABLE mode, perhaps call it PERSISTENCE: + + ALTER TABLE tab PERSISTENCE DROP ON RECOVERY + ALTER TABLE tab PERSISTENCE TRUNCATE ON RECOVERY + +These would drop or truncate all tables with this flag on a non-clean +start of the postmaster, and write something in the server logs. +However, I don't know that we have the code in place to DROP/TRUNCATE in +recovery mode, and it would affect all databases, so it could be quite +complex to implement. In this mode, no WAL logs would be written for +table modifications, though DDL commands would have to be logged. + + ALTER TABLE tab PERSISTENCE PRESERVE (or STABLE?) + +Table contents are preserved across recoveries, but data modifications +can happen only one at a time. I don't think we have a lock mode that +does this, so I am worried a new lock mode will have to be created. A +simplified solution at this stage would be to take an exclusive lock on +the table, but really we just need a single-writer table lock, which I +don't think we have. initially this can implemented to only affect COPY +but later can be done for other commands. + + ALTER TABLE tab PERSISTENCE DEFAULT + +This would be our current default mode, which is full concurrency and +persistence. + +It took me over an hour to write this, but I feel the time is worth it +because of the number of users who use our software. + +-- + Bruce Momjian | http://candle.pha.pa.us + pgman@candle.pha.pa.us | (610) 359-1001 + + If your life is a hard drive, | 13 Roberts Road + + Christ can be your backup. | Newtown Square, Pennsylvania 19073 + +---------------------------(end of broadcast)--------------------------- +TIP 2: Don't 'kill -9' the postmaster + +From pgsql-hackers-owner+M78076=pgman=candle.pha.pa.us@postgresql.org Fri Dec 30 17:37:00 2005 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +To: Bruce Momjian +cc: Simon Riggs , Andrew Dunstan , + Tom Lane , Martijn van Oosterhout , + Greg Stark , Rod Taylor , + Qingqing Zhou , pgsql-hackers@postgresql.org +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +References: <200512302114.jBULEno02301@candle.pha.pa.us> +In-Reply-To: <200512302114.jBULEno02301@candle.pha.pa.us> +From: Greg Stark +Organization: The Emacs Conspiracy; member since 1992 +Date: 30 Dec 2005 17:36:24 -0500 +Message-ID: <87mzii8b6f.fsf@stark.xeocode.com> +Lines: 28 +User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.4 +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.113 required=5 tests=[AWL=0.113] +X-Spam-Score: 0.113 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 1424 + + +As far as EXCLUSIVE or COPY LOCK goes, I think this would be useful +functionality but perhaps there doesn't have to be any proprietary user +interface to it at all. Why not just check if the conditions are already +present to allow the optimization and if so go ahead. + +That is, if the current transaction already has an exclusive lock on the table +and there are no indexes (and PITR isn't active) then Postgres could go ahead +and use the same WAL skipping logic as the other operations that already so +so. This would work for inserts whether coming from COPY or plain SQL INSERTs. + +The nice thing about this is that the user's SQL wouldn't need any proprietary +extensions at all. Just tell people to do + +BEGIN; +LOCK TABLE foo; +COPY foo from ... +COMMIT; + +There could be a COPY LOCK option to obtain a lock, but it would be purely for +user convenience so they don't have to bother with BEGIN and COMMIt. + +The only downside is a check to see if an exclusive table lock is present on +every copy and insert. That might be significant but perhaps there are ways to +finess that. If not perhaps only doing it on COPY would be a good compromise. + +-- +greg + + +---------------------------(end of broadcast)--------------------------- +TIP 1: if posting/reading through Usenet, please send an appropriate + subscribe-nomail command to majordomo@postgresql.org so that your + message can get through to the mailing list cleanly + +From pgsql-hackers-owner+M78077=pgman=candle.pha.pa.us@postgresql.org Fri Dec 30 17:47:18 2005 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +From: Bruce Momjian +Message-ID: <200512302246.jBUMkjF25196@candle.pha.pa.us> +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +In-Reply-To: <87mzii8b6f.fsf@stark.xeocode.com> +To: Greg Stark +Date: Fri, 30 Dec 2005 17:46:45 -0500 (EST) +cc: Simon Riggs , Andrew Dunstan , + Tom Lane , Martijn van Oosterhout , + Rod Taylor , Qingqing Zhou , + pgsql-hackers@postgresql.org +X-Mailer: ELM [version 2.4ME+ PL121 (25)] +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.12 required=5 tests=[AWL=0.120] +X-Spam-Score: 0.12 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 2135 + +Greg Stark wrote: +> +> As far as EXCLUSIVE or COPY LOCK goes, I think this would be useful +> functionality but perhaps there doesn't have to be any proprietary user +> interface to it at all. Why not just check if the conditions are already +> present to allow the optimization and if so go ahead. +> +> That is, if the current transaction already has an exclusive lock on the table +> and there are no indexes (and PITR isn't active) then Postgres could go ahead +> and use the same WAL skipping logic as the other operations that already so +> so. This would work for inserts whether coming from COPY or plain SQL INSERTs. +> +> The nice thing about this is that the user's SQL wouldn't need any proprietary +> extensions at all. Just tell people to do +> +> BEGIN; +> LOCK TABLE foo; +> COPY foo from ... +> COMMIT; +> +> There could be a COPY LOCK option to obtain a lock, but it would be purely for +> user convenience so they don't have to bother with BEGIN and COMMIt. +> +> The only downside is a check to see if an exclusive table lock is present on +> every copy and insert. That might be significant but perhaps there are ways to +> finess that. If not perhaps only doing it on COPY would be a good compromise. + +Well, again, if we wanted to use EXCLUSIVE only for COPY, this might +make sense. However, also consider that the idea for EXCLUSIVE was that +users could continue read-only queries on the table while it is being +loaded (like COPY allows now), and that in EXCLUSIVE mode, we are only +going to write into new pages. + +If someone has an exclusive lock on the table and does a COPY or SELECT +INTO do we want to assume we are only going to write into new pages, and +do we want to force an exclusive lock rather than a single-writer lock? +I don't think so. + +-- + Bruce Momjian | http://candle.pha.pa.us + pgman@candle.pha.pa.us | (610) 359-1001 + + If your life is a hard drive, | 13 Roberts Road + + Christ can be your backup. | Newtown Square, Pennsylvania 19073 + +---------------------------(end of broadcast)--------------------------- +TIP 6: explain analyze is your friend + +From mpaesold@gmx.at Sat Dec 31 06:59:51 2005 +Date: Sat, 31 Dec 2005 12:59:44 +0100 (MET) +From: Michael Paesold +To: Bruce Momjian +cc: simon@2ndquadrant.com, andrew@dunslane.net, tgl@sss.pgh.pa.us, + kleptog@svana.org, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, + pgsql-hackers@postgresql.org +References: <200512302114.jBULEno02301@candle.pha.pa.us> +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +X-Priority: 3 (Normal) +X-Authenticated: #1946847 +Message-ID: <14969.1136030384@www6.gmx.net> +X-Mailer: WWW-Mail 1.6 (Global Message Exchange) +X-Flags: 0001 +Content-Length: 1305 + +Bruce Momjian wrote: + +> > The --single-transaction mode would apply even if the dump was created +> > using an earlier version of pg_dump. pg_dump has *not* been altered at +> > all. (And I would again add that the idea was not my own) +> +> I assume you mean this: +> +> http://archives.postgresql.org/pgsql-patches/2005-12/msg00257.php +> +> I guess with the ALTER commands I don't see much value in the +> --single-transaction flag. I am sure others suggested it, but would +> they suggest it now given our current direction. + +I just want to add that --single-transaction has a value of it's own. There +were times when I wanted to restore parts of a dump all-or-nothing. + +This is possible with PostgreSQL, unlike many other DBM systems, because +people like Tom Lane have invested in ensuring that all DDL is working +without implicitly committing an enclosing transaction. + +Using pg_restore directly into a database, it is not possible to get a +single transaction right now. One has to restore to a file and manually +added BEGIN/COMMIT. Just for that I think --single-transaction is a great +addition and a missing feature. + +I think more people have a use-case for that. + +Best Regards, +Michael Paesold + +-- +Telefonieren Sie schon oder sparen Sie noch? +NEU: GMX Phone_Flat http://www.gmx.net/de/go/telefonie + +From pgsql-hackers-owner+M78213=pgman=candle.pha.pa.us@postgresql.org Tue Jan 3 12:08:43 2006 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +From: Bruce Momjian +Message-ID: <200601031708.k03H85j27170@candle.pha.pa.us> +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +In-Reply-To: <17173.1136306881@sss.pgh.pa.us> +To: Tom Lane +Date: Tue, 3 Jan 2006 12:08:05 -0500 (EST) +cc: Jim C. Nasby , + Andrew Dunstan , simon@2ndquadrant.com, + kleptog@svana.org, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, + pgsql-hackers@postgresql.org +X-Mailer: ELM [version 2.4ME+ PL121 (25)] +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.121 required=5 tests=[AWL=0.121] +X-Spam-Score: 0.121 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 1125 + +Tom Lane wrote: +> "Jim C. Nasby" writes: +> > On Tue, Jan 03, 2006 at 11:26:51AM -0500, Tom Lane wrote: +> >> Such an ALTER would certainly require exclusive lock on the table, +> >> so I'm not sure that I see much use-case for doing it like that. +> >> You'd want to do the ALTER and commit so as not to lock other people +> >> out of the table entirely while doing the bulk data-pushing. +> +> > Maybe this just isn't clear, but would EXCLUSIVE block writes from all +> > other sessions then? +> +> I don't think it should (which implies that EXCLUSIVE is a bad name). + +Agreed, EXCLUSIVE was used to mean an _exclusive_ writer. The new words +I proposed were PRESERVE or STABLE. + +-- + Bruce Momjian | http://candle.pha.pa.us + pgman@candle.pha.pa.us | (610) 359-1001 + + If your life is a hard drive, | 13 Roberts Road + + Christ can be your backup. | Newtown Square, Pennsylvania 19073 + +---------------------------(end of broadcast)--------------------------- +TIP 4: Have you searched our list archives? + + http://archives.postgresql.org + +From tgl@sss.pgh.pa.us Tue Jan 3 12:37:34 2006 +To: Stephen Frost +cc: Jim C. Nasby , + Bruce Momjian , + Andrew Dunstan , kleptog@svana.org, + simon@2ndquadrant.com, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, + pgsql-hackers@postgresql.org +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +In-Reply-To: <20060103165359.GP6026@ns.snowman.net> +References: <200512291605.jBTG5gi00396@candle.pha.pa.us> <7966.1135873468@sss.pgh.pa.us> <20060103154521.GC82560@pervasive.com> <20060103162137.GO6026@ns.snowman.net> <16856.1136305742@sss.pgh.pa.us> <20060103165359.GP6026@ns.snowman.net> +Comments: In-reply-to Stephen Frost + message dated "Tue, 03 Jan 2006 11:54:01 -0500" +Date: Tue, 03 Jan 2006 12:37:32 -0500 +Message-ID: <17841.1136309852@sss.pgh.pa.us> +From: Tom Lane +Content-Length: 976 + +Stephen Frost writes: +> The problem is that you might want to grant 'truncate' to people who +> *aren't* particularly trusted. For truncate, at least I have a +> real-world use-case for it. + +I don't find this use-case particularly convincing. If the users are +allowed to delete all data in a given table, then that table must be +dedicated to them anyway; so it's not that easy to see why you can't +risk giving them ownership rights on it. The worst they can do is +screw up their own data, no? + +In any case, I don't see what's so wrong with the model of using +SECURITY DEFINER interface functions when you want a security +restriction that's finer-grain than the system provides. I really +*don't* want to see us trying to, say, categorize every variety of +ALTER TABLE as a separately grantable privilege. I could live with +something like a catchall "ADMIN" privilege ... except it's not +clear how that would differ from ownership. + + regards, tom lane + +From pgsql-hackers-owner+M78221=pgman=candle.pha.pa.us@postgresql.org Tue Jan 3 13:30:34 2006 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +Date: Tue, 3 Jan 2006 13:30:56 -0500 +From: Stephen Frost +To: Tom Lane +cc: Jim C. Nasby , + Bruce Momjian , + Andrew Dunstan , kleptog@svana.org, + simon@2ndquadrant.com, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, + pgsql-hackers@postgresql.org +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +Message-ID: <20060103183056.GR6026@ns.snowman.net> +Mail-Followup-To: Tom Lane , + "Jim C. Nasby" , + Bruce Momjian , + Andrew Dunstan , kleptog@svana.org, + simon@2ndquadrant.com, gsstark@mit.edu, pg@rbt.ca, + zhouqq@cs.toronto.edu, pgsql-hackers@postgresql.org +References: <200512291605.jBTG5gi00396@candle.pha.pa.us> <7966.1135873468@sss.pgh.pa.us> <20060103154521.GC82560@pervasive.com> <20060103162137.GO6026@ns.snowman.net> <16856.1136305742@sss.pgh.pa.us> <20060103165359.GP6026@ns.snowman.net> <17841.1136309852@sss.pgh.pa.us> +Content-Disposition: inline +In-Reply-To: <17841.1136309852@sss.pgh.pa.us> +X-Editor: Vim http://www.vim.org/ +X-Info: http://www.snowman.net +X-Operating-System: Linux/2.4.24ns.3.0 (i686) +X-Uptime: 12:39:16 up 206 days, 9:50, 11 users, load average: 0.02, 0.05, 0.05 +User-Agent: Mutt/1.5.9i +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.105 required=5 tests=[AWL=0.105] +X-Spam-Score: 0.105 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 2666 + +-- Start of PGP signed section. +* Tom Lane (tgl@sss.pgh.pa.us) wrote: +> I don't find this use-case particularly convincing. If the users are +> allowed to delete all data in a given table, then that table must be +> dedicated to them anyway; so it's not that easy to see why you can't +> risk giving them ownership rights on it. The worst they can do is +> screw up their own data, no? + +Being able to delete all data in a given table in no way implies +ownership rights. The tables are part of a specification which the +users are being asked to respond to. Being able to change the table +types or remove the constraints put on the tables would allow the +users to upload garbage which would then affect downstream processing. + +We can't guarentee this won't happen anyway but we try to confine the +things they can mess up to a reasonable set which we can check for (and +do, through a rather involved error checking system). There are *alot* +of things built on top of the table structures and having them change +would basically break the whole system (without the appropriate changes +being made to the other parts of the system). + +> In any case, I don't see what's so wrong with the model of using +> SECURITY DEFINER interface functions when you want a security +> restriction that's finer-grain than the system provides. I really +> *don't* want to see us trying to, say, categorize every variety of +> ALTER TABLE as a separately grantable privilege. I could live with +> something like a catchall "ADMIN" privilege ... except it's not +> clear how that would differ from ownership. + +I don't think anyone's asked for 'ALTER TABLE' privileges to be +seperately grantable. It seems to me that the privileges which *need* +to be grantable are ones associated with DML statements. I would +classify TRUNCATE, VACUUM and ANALYZE as DML statements (along with +select, insert, update, and delete). They're PostgreSQL-specific DML +statements but they still fall into that category. I don't think +it's a coincidence that the SQL-defined DML statements are all, +individually, grantable. + +That doesn't mean I think we should get rid of RULE, REFERENCES or +TRIGGER, though honestly I've very rarely needed to grant any of them +(I don't think I've ever granted RULE or TRIGGER...). References is +DDL-oriented, but for *other* tables; RULE and TRIGGER are DDL and I +can't really justify why someone other than the owner would need them +but I'm guessing someone's using them. I don't think their existance +should imply that if we ever change the grants again we have to include +all types of 'ALTER TABLE', etc, though. + + Thanks, + + Stephen +-- End of PGP section, PGP failed! + +From sfrost@snowman.net Tue Jan 3 13:30:13 2006 +Date: Tue, 3 Jan 2006 13:30:56 -0500 +From: Stephen Frost +To: Tom Lane +cc: Jim C. Nasby , + Bruce Momjian , + Andrew Dunstan , kleptog@svana.org, + simon@2ndquadrant.com, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, + pgsql-hackers@postgresql.org +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +Message-ID: <20060103183056.GR6026@ns.snowman.net> +Mail-Followup-To: Tom Lane , + "Jim C. Nasby" , + Bruce Momjian , + Andrew Dunstan , kleptog@svana.org, + simon@2ndquadrant.com, gsstark@mit.edu, pg@rbt.ca, + zhouqq@cs.toronto.edu, pgsql-hackers@postgresql.org +References: <200512291605.jBTG5gi00396@candle.pha.pa.us> <7966.1135873468@sss.pgh.pa.us> <20060103154521.GC82560@pervasive.com> <20060103162137.GO6026@ns.snowman.net> <16856.1136305742@sss.pgh.pa.us> <20060103165359.GP6026@ns.snowman.net> <17841.1136309852@sss.pgh.pa.us> +Content-Disposition: inline +In-Reply-To: <17841.1136309852@sss.pgh.pa.us> +X-Editor: Vim http://www.vim.org/ +X-Info: http://www.snowman.net +X-Operating-System: Linux/2.4.24ns.3.0 (i686) +X-Uptime: 12:39:16 up 206 days, 9:50, 11 users, load average: 0.02, 0.05, 0.05 +User-Agent: Mutt/1.5.9i +Content-Length: 2666 + +-- Start of PGP signed section. +* Tom Lane (tgl@sss.pgh.pa.us) wrote: +> I don't find this use-case particularly convincing. If the users are +> allowed to delete all data in a given table, then that table must be +> dedicated to them anyway; so it's not that easy to see why you can't +> risk giving them ownership rights on it. The worst they can do is +> screw up their own data, no? + +Being able to delete all data in a given table in no way implies +ownership rights. The tables are part of a specification which the +users are being asked to respond to. Being able to change the table +types or remove the constraints put on the tables would allow the +users to upload garbage which would then affect downstream processing. + +We can't guarentee this won't happen anyway but we try to confine the +things they can mess up to a reasonable set which we can check for (and +do, through a rather involved error checking system). There are *alot* +of things built on top of the table structures and having them change +would basically break the whole system (without the appropriate changes +being made to the other parts of the system). + +> In any case, I don't see what's so wrong with the model of using +> SECURITY DEFINER interface functions when you want a security +> restriction that's finer-grain than the system provides. I really +> *don't* want to see us trying to, say, categorize every variety of +> ALTER TABLE as a separately grantable privilege. I could live with +> something like a catchall "ADMIN" privilege ... except it's not +> clear how that would differ from ownership. + +I don't think anyone's asked for 'ALTER TABLE' privileges to be +seperately grantable. It seems to me that the privileges which *need* +to be grantable are ones associated with DML statements. I would +classify TRUNCATE, VACUUM and ANALYZE as DML statements (along with +select, insert, update, and delete). They're PostgreSQL-specific DML +statements but they still fall into that category. I don't think +it's a coincidence that the SQL-defined DML statements are all, +individually, grantable. + +That doesn't mean I think we should get rid of RULE, REFERENCES or +TRIGGER, though honestly I've very rarely needed to grant any of them +(I don't think I've ever granted RULE or TRIGGER...). References is +DDL-oriented, but for *other* tables; RULE and TRIGGER are DDL and I +can't really justify why someone other than the owner would need them +but I'm guessing someone's using them. I don't think their existance +should imply that if we ever change the grants again we have to include +all types of 'ALTER TABLE', etc, though. + + Thanks, + + Stephen +-- End of PGP section, PGP failed! + +From pgsql-hackers-owner+M78233=pgman=candle.pha.pa.us@postgresql.org Tue Jan 3 17:39:06 2006 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +From: Bruce Momjian +Message-ID: <200601032238.k03McP804163@candle.pha.pa.us> +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +In-Reply-To: <20060103212750.GT82560@pervasive.com> +To: Jim C. Nasby +Date: Tue, 3 Jan 2006 17:38:25 -0500 (EST) +cc: Tom Lane , Andrew Dunstan , + simon@2ndquadrant.com, kleptog@svana.org, gsstark@mit.edu, pg@rbt.ca, + zhouqq@cs.toronto.edu, pgsql-hackers@postgresql.org +X-Mailer: ELM [version 2.4ME+ PL121 (25)] +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.121 required=5 tests=[AWL=0.121] +X-Spam-Score: 0.121 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 1714 + +Jim C. Nasby wrote: +> > We would be creating a new lock type for this. +> +> Sorry if I've just missed this in the thread, but what would the new +> lock type do? My impression is that as it stands you can either do: +> +> BEGIN; +> ALTER TABLE EXCLUSIVE; +> ... +> ALTER TABLE SHARE; --fsync +> COMMIT; +> +> Which would block all other access to the table as soon as the first +> ALTER TABLE happens. Or you can: +> +> ALTER TABLE EXCLUSIVE; +> ... +> ALTER TABLE SHARE; +> +> Which means that between the two ALTER TABLES every backend that does +> DML on that table will not have that DML logged, but because there's no +> exclusive lock that DML would be allowed to occur. + +Right, the DML will be single-threaded and fsync of all dirty pages will +happen before commit of each transaction. + +> BTW, there might be some usecase for the second scenario, in which case +> it would probably be better to tell the user to aquire a table-lock on +> their own rather than do it automatically as part of the update... + +> > Basically meaning your idea of update while EXCLUSIVE/PRESERVE/STABLE is +> > happening is never going to be implemented because it is just too hard +> > to do, and too prone to error. +> +> What I figured. Never hurts to ask though. :) + +Actually, it does hurt because it generates discussion volume for no +purpose. + +-- + Bruce Momjian | http://candle.pha.pa.us + pgman@candle.pha.pa.us | (610) 359-1001 + + If your life is a hard drive, | 13 Roberts Road + + Christ can be your backup. | Newtown Square, Pennsylvania 19073 + +---------------------------(end of broadcast)--------------------------- +TIP 5: don't forget to increase your free space map settings + +From pgsql-hackers-owner+M78234=pgman=candle.pha.pa.us@postgresql.org Tue Jan 3 17:54:16 2006 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +From: Simon Riggs +To: Bruce Momjian +cc: Andrew Dunstan , Tom Lane , + Martijn van Oosterhout , Greg Stark , + Rod Taylor , Qingqing Zhou , + pgsql-hackers@postgresql.org +In-Reply-To: <200512302114.jBULEno02301@candle.pha.pa.us> +References: <200512302114.jBULEno02301@candle.pha.pa.us> +Date: Tue, 03 Jan 2006 22:53:53 +0000 +Message-ID: <1136328833.5052.223.camel@localhost.localdomain> +X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.04 required=5 tests=[AWL=0.040] +X-Spam-Score: 0.04 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 5373 + +On Fri, 2005-12-30 at 16:14 -0500, Bruce Momjian wrote: +> Simon Riggs wrote: +> > The implications of EXCLUSIVE are: +> > - there will be a check on each and every I, U, D to check the state of +> > the relation +> > - *every* operation that attempts a write lock will attempt to acquire +> > an EXCLUSIVE full table lock instead +> > - following successful completion of *each* DML statement, the relation +> > will be heap_sync'd involving a full scan of the buffer cache +> +> Yes, I think that is it. What we can do is implement EXCLUSIVE to +> affect only COPY at this point, and document that, and later add other +> commands. +> +> > Can I clarify the wording of the syntax? Is EXCLUSIVE the right word? +> > How about FASTLOAD or BULKLOAD? Those words seem less likely to be +> > misused in the future - i.e. we are invoking a special mode, rather than +> > invoking a special "go faster" option. +> +> The problem with the FASTLOAD/BULKLOAD words is that EXCLUSIVE mode is +> probably not the best for loading. I would think TRUNCATE would be a +> better option. +> +> In fact, in loading a table, I think both EXCLUSIVE and TRUNCATE would be +> the same, mostly. You would create the table, set its RELIABILITY to +> TRUNCATE, COPY into the table, then set the RELIABILITY to SHARE or +> DEFAULT. The second ALTER has to sync all the dirty data blocks, which +> the same thing EXCLUSIVE does at the conclusion of COPY. +> +> So, we need a name for EXCLUSIVE mode that suggests how it is different +> from TRUNCATE, and in this case, the difference is that EXCLUSIVE +> preserves the previous contents of the table on recovery, while TRUNCATE +> does not. Do you want to call the mode PRESERVE, or EXCLUSIVE WRITER? +> Anyway, the keywords are easy to modify, even after the patch is +> submitted. FYI, I usually go through keywords.c looking for a keyword +> we already use. + +I'm very happy for suggestions on what these new modes are called. + +> > > So, to summarize, I think we should add DROP/TRUNCATE, and use that by +> > > default (or optionally off?) in pg_dump, and, assuming we want EXCLUSIVE +> > > for more than just COPY, we need to add ALTER TABLE EXCLUSIVE. +> > +> > Would you mind stating again what you mean, just so I can understand +> > this? Your summary isn't enough. +> +> New ALTER TABLE mode, perhaps call it PERSISTENCE: +> +> ALTER TABLE tab PERSISTENCE DROP ON RECOVERY +> ALTER TABLE tab PERSISTENCE TRUNCATE ON RECOVERY +> +> These would drop or truncate all tables with this flag on a non-clean +> start of the postmaster, and write something in the server logs. +> However, I don't know that we have the code in place to DROP/TRUNCATE in +> recovery mode, and it would affect all databases, so it could be quite +> complex to implement. In this mode, no WAL logs would be written for +> table modifications, though DDL commands would have to be logged. + +Right now, this will be a TODO item... it looks like it will take some +thought to implement correctly. + +> ALTER TABLE tab PERSISTENCE PRESERVE (or STABLE?) +> +> Table contents are preserved across recoveries, but data modifications +> can happen only one at a time. I don't think we have a lock mode that +> does this, so I am worried a new lock mode will have to be created. A +> simplified solution at this stage would be to take an exclusive lock on +> the table, but really we just need a single-writer table lock, which I +> don't think we have. initially this can implemented to only affect COPY +> but later can be done for other commands. + +ExclusiveLock locks out everything apart from readers, no new lock mode +AFAICS. Implementing that is little additional work for COPY. + +Tom had a concern about setting this for I, U, D commands via the +executor. Not sure what the details of that are, as yet. + +We can use either of the unlogged modes for pg_dump, so I'd suggest its +this one. Everybody happy with this being the new default in pg_dump, or +should it be an option? + +> ALTER TABLE tab PERSISTENCE DEFAULT +> +> This would be our current default mode, which is full concurrency and +> persistence. + +I'm thinking whether the ALTER TABLE statement might be better with two +bool flags rather than a 3-state char. + +flag 1: ENABLE LOGGING | DISABLE LOGGING + +flag 2: FULL RECOVERY | TRUNCATE ON RECOVERY + +Giving 3 possible sets of options: + +-- the default +ALTER TABLE mytable ENABLE LOGGING FULL RECOVERY; (default) + +-- EXCLUSIVE mode +ALTER TABLE mytable DISABLE LOGGING FULL RECOVERY; +...which would be used like this + ALTER TABLE mytable DISABLE LOGGING; + COPY or other bulk data manipulation SQL + ALTER TABLE mytable ENABLE LOGGING; +...since FULL RECOVERY is the default. + +-- multiuser temp table mode +ALTER TABLE mytable DISABLE LOGGING TRUNCATE ON RECOVERY; +...which would usually be left on all the time + +which only uses one new keyword LOGGING and yet all the modes are fairly +explicit as to what they do. + +An alternative might be the slightly more verbose: + ALTER TABLE mytable DISABLE LOGGING FORCE EXCLUSIVE TABLE LOCK; +which would be turned off by + ALTER TABLE mytable ENABLE LOGGING; + +Comments? + +Best Regards, Simon Riggs + + +---------------------------(end of broadcast)--------------------------- +TIP 1: if posting/reading through Usenet, please send an appropriate + subscribe-nomail command to majordomo@postgresql.org so that your + message can get through to the mailing list cleanly + +From simon@2ndquadrant.com Tue Jan 3 18:10:32 2006 +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +From: Simon Riggs +To: Jim C. Nasby , + Bruce Momjian +cc: Tom Lane , Andrew Dunstan , + kleptog@svana.org, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, + pgsql-hackers@postgresql.org +In-Reply-To: <200601032120.k03LKl609990@candle.pha.pa.us> +References: <200601032120.k03LKl609990@candle.pha.pa.us> +Date: Tue, 03 Jan 2006 23:10:16 +0000 +Message-ID: <1136329816.5052.239.camel@localhost.localdomain> +X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) +Content-Length: 2118 + +On Tue, 2006-01-03 at 16:20 -0500, Bruce Momjian wrote: +> Jim C. Nasby wrote: + +> > Idealistically, if EXCLUSIVE/PRESERVE/STABLE does it's thing by only +> > appending new pages, it would be nice if other backends could continue +> > performing updates at the same time, assuming there's free space +> > available elsewhere within the table (and that you'd be able to recover +> > those logged changes regardless of the non-logged operations). But +> > that's a pretty lofty goal... +> +> "Idealistically", yep. It would be great if we could put a helmet on +> and the computer would read your mind. :-) +> +> Basically meaning your idea of update while EXCLUSIVE/PRESERVE/STABLE is +> happening is never going to be implemented because it is just too hard +> to do, and too prone to error. + +The reason for locking the whole table was to ensure that we do not have +a mixture of logged and non-logged writers writing to the same data +blocks, since that could damage blocks unrecoverably in the event of a +crash. (Though perhaps only if full_block_writes is on) + +The ALTER TABLE .. EXCLUSIVE/(insert name) mode would mean that *any* +backend who took a write lock on the table, would lock out the whole +table. So this new mode is not restricted to the job/user who ran the +ALTER TABLE command. (I would note that that is how Oracle and Teradata +do this for pre-load utility table locking, but why should we follow +them on that?) + +Currently, when we add a new row when the FSM is empty, we check the +last block of the table. That would cause multiple writers to access the +same blocks and so we would be in danger. The only way to avoid that +would be for logged writers (who would use the FSM if it were not empty) +to notify back to the FSM that they have just added a block - and remove +the behaviour to look for the last block. + +Anyway, one step at a time. *Maybe* we can do that in the future, but +right now I'd like to add the basic fast write/load functionality. + +Also, I think I will do the docs first this time, just so everyone can +read what we're getting ahead of time, to ensure we all agree. + +Best Regards, Simon Riggs + +From pgsql-hackers-owner+M78236=pgman=candle.pha.pa.us@postgresql.org Tue Jan 3 18:24:20 2006 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +From: Simon Riggs +To: Bruce Momjian +cc: Jim C. Nasby , Tom Lane , + Andrew Dunstan , kleptog@svana.org, gsstark@mit.edu, + pg@rbt.ca, zhouqq@cs.toronto.edu, pgsql-hackers@postgresql.org +In-Reply-To: <200601032238.k03McP804163@candle.pha.pa.us> +References: <200601032238.k03McP804163@candle.pha.pa.us> +Date: Tue, 03 Jan 2006 23:23:54 +0000 +Message-ID: <1136330634.5052.247.camel@localhost.localdomain> +X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.043 required=5 tests=[AWL=0.043] +X-Spam-Score: 0.043 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 725 + +On Tue, 2006-01-03 at 17:38 -0500, Bruce Momjian wrote: + +> Right, the DML will be single-threaded and fsync of all dirty pages will +> happen before commit of each transaction. + +heap_sync() would occur at end of statement, as it does with CTAS. We +could delay until EOT but I'm not sure I see why; in most cases they'd +be the same point anyway. + +I'd been toying with the idea of making the freshly added blocks live +only in temp_buffers to avoid the shared_buffers overhead, but that was +starting to sounds too wierd for my liking. + +Best Regards, Simon Riggs + + +---------------------------(end of broadcast)--------------------------- +TIP 4: Have you searched our list archives? + + http://archives.postgresql.org + +From simon@2ndquadrant.com Tue Jan 3 18:58:13 2006 +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +From: Simon Riggs +To: Michael Paesold +cc: Bruce Momjian , andrew@dunslane.net, + tgl@sss.pgh.pa.us, kleptog@svana.org, gsstark@mit.edu, pg@rbt.ca, + zhouqq@cs.toronto.edu, pgsql-hackers@postgresql.org +In-Reply-To: <14969.1136030384@www6.gmx.net> +References: <200512302114.jBULEno02301@candle.pha.pa.us> + <14969.1136030384@www6.gmx.net> +Date: Tue, 03 Jan 2006 23:58:09 +0000 +Message-ID: <1136332689.5052.263.camel@localhost.localdomain> +X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) +Content-Length: 1493 + +On Sat, 2005-12-31 at 12:59 +0100, Michael Paesold wrote: +> Bruce Momjian wrote: +> +> > > The --single-transaction mode would apply even if the dump was created +> > > using an earlier version of pg_dump. pg_dump has *not* been altered at +> > > all. (And I would again add that the idea was not my own) +> > +> > I assume you mean this: +> > +> > http://archives.postgresql.org/pgsql-patches/2005-12/msg00257.php +> > +> > I guess with the ALTER commands I don't see much value in the +> > --single-transaction flag. I am sure others suggested it, but would +> > they suggest it now given our current direction. +> +> I just want to add that --single-transaction has a value of it's own. There +> were times when I wanted to restore parts of a dump all-or-nothing. +> +> This is possible with PostgreSQL, unlike many other DBM systems, because +> people like Tom Lane have invested in ensuring that all DDL is working +> without implicitly committing an enclosing transaction. +> +> Using pg_restore directly into a database, it is not possible to get a +> single transaction right now. One has to restore to a file and manually +> added BEGIN/COMMIT. Just for that I think --single-transaction is a great +> addition and a missing feature. +> +> I think more people have a use-case for that. + +I did originally separate the --single-transaction patch for this +reason. I think its a valid patch on its own and its wrapped and ready +to go, with some deletions from the doc patch. + +Best Regards, Simon Riggs + +From pgsql-hackers-owner+M78239=pgman=candle.pha.pa.us@postgresql.org Tue Jan 3 19:12:18 2006 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +From: Simon Riggs +To: Bruce Momjian +cc: Tom Lane , Martijn van Oosterhout , + Greg Stark , Rod Taylor , + Qingqing Zhou , pgsql-hackers@postgresql.org +In-Reply-To: <200512291637.jBTGbdC03848@candle.pha.pa.us> +References: <200512291637.jBTGbdC03848@candle.pha.pa.us> +Date: Wed, 04 Jan 2006 00:11:55 +0000 +Message-ID: <1136333515.5052.273.camel@localhost.localdomain> +X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.045 required=5 tests=[AWL=0.045] +X-Spam-Score: 0.045 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 1200 + +On Thu, 2005-12-29 at 11:37 -0500, Bruce Momjian wrote: +> Having COPY behave differently because it is +> in a transaction is fine as long as it is user-invisible, but once you +> require users to do that to get the speedup, it isn't user-invisible +> anymore. + +Since we're agreed on adding ALTER TABLE rather than COPY LOCK, we have +our explicit mechanism for speedup. + +However, it costs a single line of code and very very little execution +time to add in the optimization to COPY to make it bypass WAL when +executed in the same transaction that created the table. Everything else +is already there. + +As part of the use_wal test: ++ if (resultRelInfo->ri_NumIndices == 0 && ++ !XLogArchivingActive() && +>> (cstate->rel->rd_createSubid != InvalidSubTransactionId )) ++ use_wal = false; + +the value is already retrieved from cache... + +Can anyone see a reason *not* to put that change in also? We just don't +advertise it as the "suggested" route to gaining performance, nor would +we rely on it for pg_dump/restore performance. + +Best Regards, Simon Riggs + + +---------------------------(end of broadcast)--------------------------- +TIP 2: Don't 'kill -9' the postmaster + +From pgsql-hackers-owner+M78303=pgman=candle.pha.pa.us@postgresql.org Thu Jan 5 12:23:39 2006 +X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org +X-Greylist: from auto-whitelisted by SQLgrey- +From: Bruce Momjian +Message-ID: <200601051722.k05HMSM02052@candle.pha.pa.us> +Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and +In-Reply-To: <1136328833.5052.223.camel@localhost.localdomain> +To: Simon Riggs +Date: Thu, 5 Jan 2006 12:22:28 -0500 (EST) +cc: Andrew Dunstan , Tom Lane , + Martijn van Oosterhout , Greg Stark , + Rod Taylor , Qingqing Zhou , + pgsql-hackers@postgresql.org +X-Mailer: ELM [version 2.4ME+ PL121 (25)] +X-Virus-Scanned: by amavisd-new at hub.org +X-Spam-Status: No, score=0.12 required=5 tests=[AWL=0.120] +X-Spam-Score: 0.12 +X-Spam-Level: +X-Mailing-List: pgsql-hackers +List-Archive: +List-Help: +List-Id: +List-Owner: +List-Post: +List-Subscribe: +List-Unsubscribe: +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Content-Length: 6020 + +Simon Riggs wrote: +> > So, we need a name for EXCLUSIVE mode that suggests how it is different +> > from TRUNCATE, and in this case, the difference is that EXCLUSIVE +> > preserves the previous contents of the table on recovery, while TRUNCATE +> > does not. Do you want to call the mode PRESERVE, or EXCLUSIVE WRITER? +> > Anyway, the keywords are easy to modify, even after the patch is +> > submitted. FYI, I usually go through keywords.c looking for a keyword +> > we already use. +> +> I'm very happy for suggestions on what these new modes are called. +> +> > > > So, to summarize, I think we should add DROP/TRUNCATE, and use that by +> > > > default (or optionally off?) in pg_dump, and, assuming we want EXCLUSIVE +> > > > for more than just COPY, we need to add ALTER TABLE EXCLUSIVE. +> > > +> > > Would you mind stating again what you mean, just so I can understand +> > > this? Your summary isn't enough. +> > +> > New ALTER TABLE mode, perhaps call it PERSISTENCE: +> > +> > ALTER TABLE tab PERSISTENCE DROP ON RECOVERY +> > ALTER TABLE tab PERSISTENCE TRUNCATE ON RECOVERY +> > +> > These would drop or truncate all tables with this flag on a non-clean +> > start of the postmaster, and write something in the server logs. +> > However, I don't know that we have the code in place to DROP/TRUNCATE in +> > recovery mode, and it would affect all databases, so it could be quite +> > complex to implement. In this mode, no WAL logs would be written for +> > table modifications, though DDL commands would have to be logged. +> +> Right now, this will be a TODO item... it looks like it will take some +> thought to implement correctly. + +OK, I know my suggestions have made it more complicated. + +TODO added: + +* Allow control over which tables are WAL-logged + + Allow tables to bypass WAL writes and just fsync() dirty pages on + commit. To do this, only a single writer can modify the table, and + writes must happen only on new pages. Readers can continue accessing + the table. This would affect COPY, and perhaps INSERT/UPDATE too. + Another option is to avoid transaction logging entirely and truncate + or drop the table on crash recovery. These should be implemented + using ALTER TABLE, e.g. ALTER TABLE PERSISTENCE [ DROP | TRUNCATE | + STABLE | DEFAULT ]. Tables using non-default logging should not use + referential integrity with default-logging tables, and tables using + stable logging probably can not have indexes. [walcontrol] + + +> > ALTER TABLE tab PERSISTENCE PRESERVE (or STABLE?) +> > +> > Table contents are preserved across recoveries, but data modifications +> > can happen only one at a time. I don't think we have a lock mode that +> > does this, so I am worried a new lock mode will have to be created. A +> > simplified solution at this stage would be to take an exclusive lock on +> > the table, but really we just need a single-writer table lock, which I +> > don't think we have. initially this can implemented to only affect COPY +> > but later can be done for other commands. +> +> ExclusiveLock locks out everything apart from readers, no new lock mode +> AFAICS. Implementing that is little additional work for COPY. + +Nice. + +> Tom had a concern about setting this for I, U, D commands via the +> executor. Not sure what the details of that are, as yet. + +That is much more complicated than the COPY-only idea, for sure. I am +thinking we could add the ALTER syntax and just do COPY at this stage, +meaning that I/U/D still do full logging until we get to improving them. +The big benefit is that the user API doesn't need to change when we +improve the code. In fact I think we could do the TRUNCATE/DROP easily +for I/U/D, but the STABLE option would require work and we don't need to +implement it in the first patch. + +> We can use either of the unlogged modes for pg_dump, so I'd suggest its +> this one. Everybody happy with this being the new default in pg_dump, or +> should it be an option? +> +> > ALTER TABLE tab PERSISTENCE DEFAULT +> > +> > This would be our current default mode, which is full concurrency and +> > persistence. +> +> I'm thinking whether the ALTER TABLE statement might be better with two +> bool flags rather than a 3-state char. +> +> flag 1: ENABLE LOGGING | DISABLE LOGGING +> +> flag 2: FULL RECOVERY | TRUNCATE ON RECOVERY +> +> Giving 3 possible sets of options: +> +> -- the default +> ALTER TABLE mytable ENABLE LOGGING FULL RECOVERY; (default) +> +> -- EXCLUSIVE mode +> ALTER TABLE mytable DISABLE LOGGING FULL RECOVERY; +> ...which would be used like this +> ALTER TABLE mytable DISABLE LOGGING; +> COPY or other bulk data manipulation SQL +> ALTER TABLE mytable ENABLE LOGGING; +> ...since FULL RECOVERY is the default. +> +> -- multiuser temp table mode +> ALTER TABLE mytable DISABLE LOGGING TRUNCATE ON RECOVERY; +> ...which would usually be left on all the time +> +> which only uses one new keyword LOGGING and yet all the modes are fairly +> explicit as to what they do. +> +> An alternative might be the slightly more verbose: +> ALTER TABLE mytable DISABLE LOGGING FORCE EXCLUSIVE TABLE LOCK; +> which would be turned off by +> ALTER TABLE mytable ENABLE LOGGING; +> +> Comments? + +I had the same idea originally, but avoided it because the logging +really does affect what other options you can use. For example, if you +want truncate on recovery, you certainly do not want logging, so it +seems the options are not really independent. In fact if someone asks +for truncate on recovery, do we automatically turn off logging for them, +or throw an error, or a warning. It just seemed too error-prone and +confusing, though perhaps more logical. Of course, if others like the +above, we can do it. + +-- + Bruce Momjian | http://candle.pha.pa.us + pgman@candle.pha.pa.us | (610) 359-1001 + + If your life is a hard drive, | 13 Roberts Road + + Christ can be your backup. | Newtown Square, Pennsylvania 19073 + +---------------------------(end of broadcast)--------------------------- +TIP 2: Don't 'kill -9' the postmaster +