This patch makes some improvements and adds some additional detail

to the documentation on routine database maintainence activities. I also corrected a bunch of SGML markup.
2003-12-14 00:10:32 +00:00 · 2003-12-14 00:10:32 +00:00 · 0b52062265
parent e24018728c
commit 0b52062265
6 changed files with 165 additions and 119 deletions
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@ -1,5 +1,5 @@
 <!--
-$PostgreSQL: pgsql/doc/src/sgml/maintenance.sgml,v 1.29 2003/11/29 19:51:37 pgsql Exp $
+$PostgreSQL: pgsql/doc/src/sgml/maintenance.sgml,v 1.30 2003/12/14 00:10:32 neilc Exp $
 -->

 <chapter id="maintenance">
@ -87,7 +87,7 @@ $PostgreSQL: pgsql/doc/src/sgml/maintenance.sgml,v 1.29 2003/11/29 19:51:37 pgsq
   of <command>VACUUM</> can run in parallel with normal database operations
   (selects, inserts, updates, deletes, but not changes to table definitions).
   Routine vacuuming is therefore not nearly as intrusive as it was in prior
-   releases, and it's not as critical to try to schedule it at low-usage
+   releases, and it is not as critical to try to schedule it at low-usage
   times of day.
  </para>

@ -115,7 +115,7 @@ $PostgreSQL: pgsql/doc/src/sgml/maintenance.sgml,v 1.29 2003/11/29 19:51:37 pgsq
    Clearly, a table that receives frequent updates or deletes will need
    to be vacuumed more often than tables that are seldom updated. It
    may be useful to set up periodic <application>cron</> tasks that
-    vacuum only selected tables, skipping tables that are known not to
+    <command>VACUUM</command> only selected tables, skipping tables that are known not to
    change often. This is only likely to be helpful if you have both
    large heavily-updated tables and large seldom-updated tables --- the
    extra cost of vacuuming a small table isn't enough to be worth
@ -123,39 +123,69 @@ $PostgreSQL: pgsql/doc/src/sgml/maintenance.sgml,v 1.29 2003/11/29 19:51:37 pgsq
   </para>

   <para>
-    The standard form of <command>VACUUM</> is best used with the goal of
-    maintaining a fairly level steady-state usage of disk space.  The standard
-    form finds old row versions and makes their space available for re-use within
-    the table, but it does not try very hard to shorten the table file and
-    return disk space to the operating system.  If you need to return disk
-    space to the operating system you can use <command>VACUUM FULL</> ---
-    but what's the point of releasing disk space that will only have to be
-    allocated again soon?  Moderately frequent standard <command>VACUUM</> runs
-    are a better approach than infrequent <command>VACUUM FULL</> runs for
-    maintaining heavily-updated tables.
+    There are two variants of the <command>VACUUM</command>
+    command. The first form, known as <quote>lazy vacuum</quote> or
+    just <command>VACUUM</command>, marks expired data in tables and
+    indexes for future reuse; it does <emphasis>not</emphasis> attempt
+    to reclaim the space used by this expired data
+    immediately. Therefore, the table file is not shortened, and any
+    unused space in the file is not returned to the operating
+    system. This variant of <command>VACUUM</command> can be run
+    concurrently with normal database operations.
+   </para>
+
+   <para>
+    The second form is the <command>VACUUM FULL</command>
+    command. This uses a more aggressive algorithm for reclaiming the
+    space consumed by expired row versions. Any space that is freed by
+    <command>VACUUM FULL</command> is immediately returned to the
+    operating system. Unfortunately, this variant of the
+    <command>VACUUM</command> command acquires an exclusive lock on
+    each table while <command>VACUUM FULL</command> is processing
+    it. Therefore, frequently using <command>VACUUM FULL</command> can
+    have an extremely negative effect on the performance of concurrent
+    database queries.
+   </para>
+
+   <para>
+    The standard form of <command>VACUUM</> is best used with the goal
+    of maintaining a fairly level steady-state usage of disk space. If
+    you need to return disk space to the operating system you can use
+    <command>VACUUM FULL</> --- but what's the point of releasing disk
+    space that will only have to be allocated again soon?  Moderately
+    frequent standard <command>VACUUM</> runs are a better approach
+    than infrequent <command>VACUUM FULL</> runs for maintaining
+    heavily-updated tables.
   </para>

   <para>
    Recommended practice for most sites is to schedule a database-wide
-    <command>VACUUM</> once a day at a low-usage time of day, supplemented
-    by more frequent vacuuming of heavily-updated tables if necessary.
-    (If you have multiple databases in a cluster, don't forget to
-    vacuum each one; the program <filename>vacuumdb</> may be helpful.)
-    Use plain <command>VACUUM</>, not <command>VACUUM FULL</>, for routine
+    <command>VACUUM</> once a day at a low-usage time of day,
+    supplemented by more frequent vacuuming of heavily-updated tables
+    if necessary. In fact, some installations with an extremely high
+    rate of data modification <command>VACUUM</command> some tables as
+    often as once very five minutes.  (If you have multiple databases
+    in a cluster, don't forget to <command>VACUUM</command> each one;
+    the program <filename>vacuumdb</> may be helpful.)
+   </para>
+
+   <para>
+    <command>VACUUM FULL</> is recommended for cases where you know
+    you have deleted the majority of rows in a table, so that the
+    steady-state size of the table can be shrunk substantially with
+    <command>VACUUM FULL</>'s more aggressive approach.  Use plain
+    <command>VACUUM</>, not <command>VACUUM FULL</>, for routine
    vacuuming for space recovery.
   </para>

   <para>
-    <command>VACUUM FULL</> is recommended for cases where you know you have
-    deleted the majority of rows in a table, so that the steady-state size
-    of the table can be shrunk substantially with <command>VACUUM FULL</>'s
-    more aggressive approach.
-   </para>
-
-   <para>
-    If you have a table whose contents are deleted completely every so often,
-    consider doing it with <command>TRUNCATE</> rather than using
-    <command>DELETE</> followed by <command>VACUUM</>.
+    If you have a table whose contents are deleted on a periodic
+    basis, consider doing it with <command>TRUNCATE</command> rather
+    than using <command>DELETE</command> followed by
+    <command>VACUUM</command>. <command>TRUNCATE</command> removes the
+    entire content of the table immediately, without recquiring a
+    subsequent <command>VACUUM</command> or <command>VACUUM
+    FULL</command> to reclaim the now-unused disk space.
   </para>
  </sect2>

@ -319,7 +349,7 @@ $PostgreSQL: pgsql/doc/src/sgml/maintenance.sgml,v 1.29 2003/11/29 19:51:37 pgsq
    statistics in the system table <literal>pg_database</>.  In particular,
    the <literal>datfrozenxid</> column of a database's
    <literal>pg_database</> row is updated at the completion of any
-    database-wide vacuum operation (i.e., <command>VACUUM</> that does not
+    database-wide <command>VACUUM</command> operation (i.e., <command>VACUUM</> that does not
    name a specific table).  The value stored in this field is the freeze
    cutoff XID that was used by that <command>VACUUM</> command.  All normal
    XIDs older than this cutoff XID are guaranteed to have been replaced by
@ -338,7 +368,7 @@ SELECT datname, age(datfrozenxid) FROM pg_database;
    With the standard freezing policy, the <literal>age</> column will start
    at one billion for a freshly-vacuumed database.  When the <literal>age</>
    approaches two billion, the database must be vacuumed again to avoid
-    risk of wraparound failures.  Recommended practice is to vacuum each
+    risk of wraparound failures.  Recommended practice is to <command>VACUUM</command> each
    database at least once every half-a-billion (500 million) transactions,
    so as to provide plenty of safety margin.  To help meet this rule,
    each database-wide <command>VACUUM</> automatically delivers a warning
@ -366,7 +396,7 @@ VACUUM
    It should also be used to prepare any user-created databases that
    are to be marked <literal>datallowconn</> = <literal>false</> in
    <literal>pg_database</>, since there isn't any convenient way to
-    vacuum a database that you can't connect to. Note that
+    <command>VACUUM</command> a database that you can't connect to. Note that
    <command>VACUUM</command>'s automatic warning message about
    unvacuumed databases will ignore <literal>pg_database</> entries
    with <literal>datallowconn</> = <literal>false</>, so as to avoid
@ -404,20 +434,22 @@ VACUUM
  </indexterm>

  <para>
-   It's a good idea to save the database server's log output somewhere,
-   rather than just routing it to <filename>/dev/null</>.  The log output
-   is invaluable when it comes time to diagnose problems.  However, the
-   log output tends to be voluminous (especially at higher debug levels)
-   and you won't want to save it indefinitely.  You need to <quote>rotate</>
-   the log files so that new log files are started and old ones thrown
-   away every so often.
+   It is a good idea to save the database server's log output
+   somewhere, rather than just routing it to <filename>/dev/null</>.
+   The log output is invaluable when it comes time to diagnose
+   problems.  However, the log output tends to be voluminous
+   (especially at higher debug levels) and you won't want to save it
+   indefinitely.  You need to <quote>rotate</> the log files so that
+   new log files are started and old ones removed after a reasonable
+   period of time.
  </para>

  <para>
   If you simply direct the <systemitem>stderr</> of the <command>postmaster</command> into a
   file, the only way to truncate the log file is to stop and restart
-   the <command>postmaster</command>. This may be OK for development setups but you won't
-   want to run a production server that way.
+   the <command>postmaster</command>. This may be OK if you are using
+   <productname>PostgreSQL</productname> in a development environment,
+   but few production servers would find this behavior acceptable.
  </para>

  <para>
@ -444,14 +476,16 @@ VACUUM
   <command>pg_ctl</>, then the <systemitem>stderr</> of the <command>postmaster</command>
   is already redirected to <systemitem>stdout</>, so you just need a
   pipe command:
-   
+
 <programlisting>
 pg_ctl start | logrotate
 </programlisting>

-   The <productname>PostgreSQL</> distribution doesn't include a suitable
-   log rotation program, but there are many available on the Internet;
-   one is included in the Apache distribution, for example.
+   The <productname>PostgreSQL</> distribution doesn't include a
+   suitable log rotation program, but there are many available on the
+   Internet. For example, the <application>logrotate</application>
+   tool included in the <productname>Apache</productname> distribution
+   can be used with <productname>PostgreSQL</productname>.
  </para>
 </sect1>
 </chapter>
--- a/doc/src/sgml/page.sgml
+++ b/doc/src/sgml/page.sgml
@ -1,5 +1,5 @@
 <!--
-$PostgreSQL: pgsql/doc/src/sgml/page.sgml,v 1.16 2003/11/29 19:51:37 pgsql Exp $
+$PostgreSQL: pgsql/doc/src/sgml/page.sgml,v 1.17 2003/12/14 00:10:32 neilc Exp $
 -->

 <chapter id="page">
@ -151,7 +151,8 @@ data. Empty in ordinary tables.</entry>
 </table>

 <para>
-  All the details may be found in src/include/storage/bufpage.h.
+  All the details may be found in
+  <filename>src/include/storage/bufpage.h</filename>.
 </para>

 <para>  
@ -305,7 +306,8 @@ data. Empty in ordinary tables.</entry>
 </table>

 <para>
-   All the details may be found in src/include/access/htup.h.
+   All the details may be found in
+   <filename>src/include/access/htup.h</filename>.
 </para>

 <para>
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@ -1,5 +1,5 @@
 <!--
-$PostgreSQL: pgsql/doc/src/sgml/perform.sgml,v 1.38 2003/11/29 19:51:37 pgsql Exp $
+$PostgreSQL: pgsql/doc/src/sgml/perform.sgml,v 1.39 2003/12/14 00:10:32 neilc Exp $
 -->

 <chapter id="performance-tips">
@ -100,7 +100,7 @@ $PostgreSQL: pgsql/doc/src/sgml/perform.sgml,v 1.38 2003/11/29 19:51:37 pgsql Ex

   <para>
    Here are some examples (using the regression test database after a
-    <literal>VACUUM ANALYZE</>, and 7.3 development sources):
+    <command>VACUUM ANALYZE</>, and 7.3 development sources):

 <programlisting>
 EXPLAIN SELECT * FROM tenk1;
--- a/doc/src/sgml/plperl.sgml
+++ b/doc/src/sgml/plperl.sgml
@ -1,5 +1,5 @@
 <!--
-$PostgreSQL: pgsql/doc/src/sgml/plperl.sgml,v 2.21 2003/11/29 19:51:37 pgsql Exp $
+$PostgreSQL: pgsql/doc/src/sgml/plperl.sgml,v 2.22 2003/12/14 00:10:32 neilc Exp $
 -->

 <chapter id="plperl">
@ -152,7 +152,7 @@ SELECT name, empcomp(employee) FROM employee;
  <para>
   The argument values supplied to a PL/Perl function's code are
   simply the input arguments converted to text form (just as if they
-   had been displayed by a <literal>SELECT</literal> statement).
+   had been displayed by a <command>SELECT</command> statement).
   Conversely, the <literal>return</> command will accept any string
   that is acceptable input format for the function's declared return
   type.  So, the PL/Perl programmer can manipulate data values as if
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/protocol.sgml,v 1.48 2003/11/29 19:51:37 pgsql Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/protocol.sgml,v 1.49 2003/12/14 00:10:32 neilc Exp $ -->

 <chapter id="protocol">
 <title>Frontend/Backend Protocol</title>
@ -200,13 +200,13 @@
  <para>
   This section describes the message flow and the semantics of each
   message type.  (Details of the exact representation of each message
-   appear in <xref linkend="protocol-message-formats">.)
-   There are several different sub-protocols
-   depending on the state of the connection: start-up,
-   query, function call, COPY, and termination.  There are also special
-   provisions for asynchronous operations (including
-   notification responses and command cancellation),
-   which can occur at any time after the start-up phase.
+   appear in <xref linkend="protocol-message-formats">.)  There are
+   several different sub-protocols depending on the state of the
+   connection: start-up, query, function call,
+   <command>COPY</command>, and termination.  There are also special
+   provisions for asynchronous operations (including notification
+   responses and command cancellation), which can occur at any time
+   after the start-up phase.
  </para>

  <sect2>
@ -989,15 +989,17 @@
   </para>

   <para>
-    ParameterStatus messages will be generated whenever the active value
-    changes for any of the parameters the backend believes the frontend
-    should know about.  Most commonly this occurs in response to a
-    <command>SET</> SQL command executed by the frontend, and this case
-    is effectively synchronous --- but it is also possible for parameter
-    status changes to occur because the administrator changed a configuration
-    file and then sent the <systemitem>SIGHUP</systemitem> signal to the postmaster.  Also, if a SET command is
-    rolled back, an appropriate ParameterStatus message will be generated
-    to report the current effective value.
+    ParameterStatus messages will be generated whenever the active
+    value changes for any of the parameters the backend believes the
+    frontend should know about.  Most commonly this occurs in response
+    to a <command>SET</> SQL command executed by the frontend, and
+    this case is effectively synchronous --- but it is also possible
+    for parameter status changes to occur because the administrator
+    changed a configuration file and then sent the
+    <systemitem>SIGHUP</systemitem> signal to the postmaster.  Also,
+    if a <command>SET</command> command is rolled back, an appropriate
+    ParameterStatus message will be generated to report the current
+    effective value.
   </para>

   <para>
@ -1129,46 +1131,53 @@
   <para>
    For either normal or abnormal termination, any open transaction is
    rolled back, not committed.  One should note however that if a
-    frontend disconnects while a non-SELECT query is being processed,
-    the backend will probably finish the query before noticing the
-    disconnection.
-    If the query is outside any transaction block (<command>BEGIN</>
-    ... <command>COMMIT</> sequence) then its results may be committed
-    before the disconnection is recognized.
+    frontend disconnects while a non-<command>SELECT</command> query
+    is being processed, the backend will probably finish the query
+    before noticing the disconnection.  If the query is outside any
+    transaction block (<command>BEGIN</> ... <command>COMMIT</>
+    sequence) then its results may be committed before the
+    disconnection is recognized.
   </para>
  </sect2>

  <Sect2>
-   <Title>SSL Session Encryption</Title>
+   <Title><acronym>SSL</acronym> Session Encryption</Title>

   <Para>
-    If <productname>PostgreSQL</> was built with SSL support, frontend/backend
-    communications can be encrypted using SSL.  This provides communication
-    security in environments where attackers might be able to capture the
-    session traffic.
+    If <productname>PostgreSQL</> was built with
+    <acronym>SSL</acronym> support, frontend/backend communications
+    can be encrypted using <acronym>SSL</acronym>.  This provides
+    communication security in environments where attackers might be
+    able to capture the session traffic. For more information on
+    encrypting <productname>PostgreSQL</productname> sessions with
+    <acronym>SSL</acronym>, see <xref linkend="ssl-tcp">.
   </para>

   <para>
-    To initiate an SSL-encrypted connection, the frontend initially sends
-    an SSLRequest message rather than a StartupMessage.  The server then
-    responds with a single byte containing <literal>S</> or <literal>N</>,
-    indicating that it is willing or unwilling to perform SSL, respectively.
-    The frontend may close the connection at this point if it is dissatisfied
-    with the response.  To continue after <literal>S</>, perform an SSL
-    startup handshake (not described here, part of the SSL specification)
-    with the server.  If this is successful, continue with
-    sending the usual StartupMessage.  In this case the StartupMessage and
-    all subsequent data will be SSL-encrypted.  To continue after
+    To initiate an <acronym>SSL</acronym>-encrypted connection, the
+    frontend initially sends an SSLRequest message rather than a
+    StartupMessage.  The server then responds with a single byte
+    containing <literal>S</> or <literal>N</>, indicating that it is
+    willing or unwilling to perform <acronym>SSL</acronym>,
+    respectively.  The frontend may close the connection at this point
+    if it is dissatisfied with the response.  To continue after
+    <literal>S</>, perform an <acronym>SSL</acronym> startup handshake
+    (not described here, part of the <acronym>SSL</acronym>
+    specification) with the server.  If this is successful, continue
+    with sending the usual StartupMessage.  In this case the
+    StartupMessage and all subsequent data will be
+    <acronym>SSL</acronym>-encrypted.  To continue after
    <literal>N</>, send the usual StartupMessage and proceed without
    encryption.
   </para>

   <para>
-    The frontend should also be prepared to handle an ErrorMessage response
-    to SSLRequest from the server.  This would only occur if the server
-    predates the addition of SSL support to <productname>PostgreSQL</>.
-    In this case the connection must be closed, but the frontend may choose
-    to open a fresh connection and proceed without requesting SSL.
+    The frontend should also be prepared to handle an ErrorMessage
+    response to SSLRequest from the server.  This would only occur if
+    the server predates the addition of <acronym>SSL</acronym> support
+    to <productname>PostgreSQL</>.  In this case the connection must
+    be closed, but the frontend may choose to open a fresh connection
+    and proceed without requesting <acronym>SSL</acronym>.
   </para>

   <para>
@ -1178,8 +1187,9 @@

   <para>
    While the protocol itself does not provide a way for the server to
-    force SSL encryption, the administrator may configure the server to
-    reject unencrypted sessions as a byproduct of authentication checking.
+    force <acronym>SSL</acronym> encryption, the administrator may
+    configure the server to reject unencrypted sessions as a byproduct
+    of authentication checking.
   </para>
  </sect2>
 </sect1>
@ -2106,7 +2116,7 @@ CopyData (F &amp; B)
 </Term>
 <ListItem>
 <Para>
-                Identifies the message as COPY data.
+                Identifies the message as <command>COPY</command> data.
 </Para>
 </ListItem>
 </VarListEntry>
@ -2153,7 +2163,7 @@ CopyDone (F &amp; B)
 </Term>
 <ListItem>
 <Para>
-                Identifies the message as a COPY-complete indicator.
+                Identifies the message as a <command>COPY</command>-complete indicator.
 </Para>
 </ListItem>
 </VarListEntry>
@ -2188,7 +2198,7 @@ CopyFail (F)
 </Term>
 <ListItem>
 <Para>
-                Identifies the message as a COPY-failure indicator.
+                Identifies the message as a <command>COPY</command>-failure indicator.
 </Para>
 </ListItem>
 </VarListEntry>
@ -2255,7 +2265,7 @@ CopyInResponse (B)
 </Term>
 <ListItem>
 <Para>
-                0 indicates the overall copy format is textual (rows
+                0 indicates the overall <command>COPY</command> format is textual (rows
 		separated by newlines, columns separated by separator
 		characters, etc).
 		1 indicates the overall copy format is binary (similar
@ -2330,13 +2340,12 @@ CopyOutResponse (B)
 </Term>
 <ListItem>
 <Para>
-                0 indicates the overall copy format is textual (rows
-		separated by newlines, columns separated by separator
-		characters, etc).
-		1 indicates the overall copy format is binary (similar
-		to DataRow format).
-		See <xref linkend="sql-copy" endterm="sql-copy-title">
-		for more information.
+                0 indicates the overall <command>COPY</command> format
+                is textual (rows separated by newlines, columns
+                separated by separator characters, etc). 1 indicates
+                the overall copy format is binary (similar to DataRow
+                format). See <xref linkend="sql-copy"
+                endterm="sql-copy-title"> for more information. 
 </Para>
 </ListItem>
 </VarListEntry>
@ -3602,7 +3611,7 @@ SSLRequest (F)
 </Term>
 <ListItem>
 <Para>
-                The SSL request code.  The value is chosen to contain
+                The <acronym>SSL</acronym> request code.  The value is chosen to contain
                <literal>1234</> in the most significant 16 bits, and <literal>5679</> in the
                least 16 significant bits.  (To avoid confusion, this code
                must not be the same as any protocol version number.)
@ -3899,8 +3908,9 @@ message.
 <ListItem>
 <Para>
        Where: an indication of the context in which the error occurred.
-	Presently this includes a call stack traceback of active PL functions.
-	The trace is one entry per line, most recent first.
+	Presently this includes a call stack traceback of active
+	procedural language functions.  The trace is one entry per line,
+	most recent first.
 </Para>
 </ListItem>
 </VarListEntry>
@ -4006,12 +4016,12 @@ may allow improvements in performance or functionality.
 </para>

 <para>
-COPY data is now encapsulated into CopyData and CopyDone messages.  There
-is a well-defined way to recover from errors during COPY.  The special
+<command>COPY</command> data is now encapsulated into CopyData and CopyDone messages.  There
+is a well-defined way to recover from errors during <command>COPY</command>.  The special
 <quote><literal>\.</></quote> last line is not needed anymore, and is not sent
-during COPY OUT.
-(It is still recognized as a terminator during COPY IN, but its use is
-deprecated and will eventually be removed.)  Binary COPY is supported.
+during <command>COPY OUT</command>.
+(It is still recognized as a terminator during <command>COPY IN</command>, but its use is
+deprecated and will eventually be removed.)  Binary <command>COPY</command> is supported.
 The CopyInResponse and CopyOutResponse messages include fields indicating
 the number of columns and the format of each column.
 </para>
@ -4046,7 +4056,7 @@ the backend.
 <para>
 The NotificationResponse ('<literal>A</>') message has an additional string
 field, which is presently empty but may someday carry additional data passed
-from the NOTIFY event sender.
+from the <command>NOTIFY</command> event sender.
 </para>

 <para>
--- a/doc/src/sgml/typeconv.sgml
+++ b/doc/src/sgml/typeconv.sgml
@ -1,5 +1,5 @@
 <!--
-$PostgreSQL: pgsql/doc/src/sgml/typeconv.sgml,v 1.41 2003/12/02 00:26:59 tgl Exp $
+$PostgreSQL: pgsql/doc/src/sgml/typeconv.sgml,v 1.42 2003/12/14 00:10:32 neilc Exp $
 -->

 <chapter Id="typeconv">
@ -122,9 +122,9 @@ with, and perhaps converted to, the types of the target columns.
 </term>
 <listitem>
 <para>
-Since all query results from a unionized <literal>SELECT</literal> statement
+Since all query results from a unionized <command>SELECT</command> statement
 must appear in a single set of columns, the types of the results of each
-<literal>SELECT</> clause must be matched up and converted to a uniform set.
+<command>SELECT</> clause must be matched up and converted to a uniform set.
 Similarly, the branch expressions of a <literal>CASE</> construct must be
 converted to a common type so that the <literal>CASE</> expression as a whole
 has a known output type.  The same holds for <literal>ARRAY</> constructs.