This patch makes some improvements and adds some additional detail

to the documentation on routine database maintainence activities.
I also corrected a bunch of SGML markup.
This commit is contained in:
Neil Conway 2003-12-14 00:10:32 +00:00
parent e24018728c
commit 0b52062265
6 changed files with 165 additions and 119 deletions

View file

@ -1,5 +1,5 @@
<!--
$PostgreSQL: pgsql/doc/src/sgml/maintenance.sgml,v 1.29 2003/11/29 19:51:37 pgsql Exp $
$PostgreSQL: pgsql/doc/src/sgml/maintenance.sgml,v 1.30 2003/12/14 00:10:32 neilc Exp $
-->
<chapter id="maintenance">
@ -87,7 +87,7 @@ $PostgreSQL: pgsql/doc/src/sgml/maintenance.sgml,v 1.29 2003/11/29 19:51:37 pgsq
of <command>VACUUM</> can run in parallel with normal database operations
(selects, inserts, updates, deletes, but not changes to table definitions).
Routine vacuuming is therefore not nearly as intrusive as it was in prior
releases, and it's not as critical to try to schedule it at low-usage
releases, and it is not as critical to try to schedule it at low-usage
times of day.
</para>
@ -115,7 +115,7 @@ $PostgreSQL: pgsql/doc/src/sgml/maintenance.sgml,v 1.29 2003/11/29 19:51:37 pgsq
Clearly, a table that receives frequent updates or deletes will need
to be vacuumed more often than tables that are seldom updated. It
may be useful to set up periodic <application>cron</> tasks that
vacuum only selected tables, skipping tables that are known not to
<command>VACUUM</command> only selected tables, skipping tables that are known not to
change often. This is only likely to be helpful if you have both
large heavily-updated tables and large seldom-updated tables --- the
extra cost of vacuuming a small table isn't enough to be worth
@ -123,39 +123,69 @@ $PostgreSQL: pgsql/doc/src/sgml/maintenance.sgml,v 1.29 2003/11/29 19:51:37 pgsq
</para>
<para>
The standard form of <command>VACUUM</> is best used with the goal of
maintaining a fairly level steady-state usage of disk space. The standard
form finds old row versions and makes their space available for re-use within
the table, but it does not try very hard to shorten the table file and
return disk space to the operating system. If you need to return disk
space to the operating system you can use <command>VACUUM FULL</> ---
but what's the point of releasing disk space that will only have to be
allocated again soon? Moderately frequent standard <command>VACUUM</> runs
are a better approach than infrequent <command>VACUUM FULL</> runs for
maintaining heavily-updated tables.
There are two variants of the <command>VACUUM</command>
command. The first form, known as <quote>lazy vacuum</quote> or
just <command>VACUUM</command>, marks expired data in tables and
indexes for future reuse; it does <emphasis>not</emphasis> attempt
to reclaim the space used by this expired data
immediately. Therefore, the table file is not shortened, and any
unused space in the file is not returned to the operating
system. This variant of <command>VACUUM</command> can be run
concurrently with normal database operations.
</para>
<para>
The second form is the <command>VACUUM FULL</command>
command. This uses a more aggressive algorithm for reclaiming the
space consumed by expired row versions. Any space that is freed by
<command>VACUUM FULL</command> is immediately returned to the
operating system. Unfortunately, this variant of the
<command>VACUUM</command> command acquires an exclusive lock on
each table while <command>VACUUM FULL</command> is processing
it. Therefore, frequently using <command>VACUUM FULL</command> can
have an extremely negative effect on the performance of concurrent
database queries.
</para>
<para>
The standard form of <command>VACUUM</> is best used with the goal
of maintaining a fairly level steady-state usage of disk space. If
you need to return disk space to the operating system you can use
<command>VACUUM FULL</> --- but what's the point of releasing disk
space that will only have to be allocated again soon? Moderately
frequent standard <command>VACUUM</> runs are a better approach
than infrequent <command>VACUUM FULL</> runs for maintaining
heavily-updated tables.
</para>
<para>
Recommended practice for most sites is to schedule a database-wide
<command>VACUUM</> once a day at a low-usage time of day, supplemented
by more frequent vacuuming of heavily-updated tables if necessary.
(If you have multiple databases in a cluster, don't forget to
vacuum each one; the program <filename>vacuumdb</> may be helpful.)
Use plain <command>VACUUM</>, not <command>VACUUM FULL</>, for routine
<command>VACUUM</> once a day at a low-usage time of day,
supplemented by more frequent vacuuming of heavily-updated tables
if necessary. In fact, some installations with an extremely high
rate of data modification <command>VACUUM</command> some tables as
often as once very five minutes. (If you have multiple databases
in a cluster, don't forget to <command>VACUUM</command> each one;
the program <filename>vacuumdb</> may be helpful.)
</para>
<para>
<command>VACUUM FULL</> is recommended for cases where you know
you have deleted the majority of rows in a table, so that the
steady-state size of the table can be shrunk substantially with
<command>VACUUM FULL</>'s more aggressive approach. Use plain
<command>VACUUM</>, not <command>VACUUM FULL</>, for routine
vacuuming for space recovery.
</para>
<para>
<command>VACUUM FULL</> is recommended for cases where you know you have
deleted the majority of rows in a table, so that the steady-state size
of the table can be shrunk substantially with <command>VACUUM FULL</>'s
more aggressive approach.
</para>
<para>
If you have a table whose contents are deleted completely every so often,
consider doing it with <command>TRUNCATE</> rather than using
<command>DELETE</> followed by <command>VACUUM</>.
If you have a table whose contents are deleted on a periodic
basis, consider doing it with <command>TRUNCATE</command> rather
than using <command>DELETE</command> followed by
<command>VACUUM</command>. <command>TRUNCATE</command> removes the
entire content of the table immediately, without recquiring a
subsequent <command>VACUUM</command> or <command>VACUUM
FULL</command> to reclaim the now-unused disk space.
</para>
</sect2>
@ -319,7 +349,7 @@ $PostgreSQL: pgsql/doc/src/sgml/maintenance.sgml,v 1.29 2003/11/29 19:51:37 pgsq
statistics in the system table <literal>pg_database</>. In particular,
the <literal>datfrozenxid</> column of a database's
<literal>pg_database</> row is updated at the completion of any
database-wide vacuum operation (i.e., <command>VACUUM</> that does not
database-wide <command>VACUUM</command> operation (i.e., <command>VACUUM</> that does not
name a specific table). The value stored in this field is the freeze
cutoff XID that was used by that <command>VACUUM</> command. All normal
XIDs older than this cutoff XID are guaranteed to have been replaced by
@ -338,7 +368,7 @@ SELECT datname, age(datfrozenxid) FROM pg_database;
With the standard freezing policy, the <literal>age</> column will start
at one billion for a freshly-vacuumed database. When the <literal>age</>
approaches two billion, the database must be vacuumed again to avoid
risk of wraparound failures. Recommended practice is to vacuum each
risk of wraparound failures. Recommended practice is to <command>VACUUM</command> each
database at least once every half-a-billion (500 million) transactions,
so as to provide plenty of safety margin. To help meet this rule,
each database-wide <command>VACUUM</> automatically delivers a warning
@ -366,7 +396,7 @@ VACUUM
It should also be used to prepare any user-created databases that
are to be marked <literal>datallowconn</> = <literal>false</> in
<literal>pg_database</>, since there isn't any convenient way to
vacuum a database that you can't connect to. Note that
<command>VACUUM</command> a database that you can't connect to. Note that
<command>VACUUM</command>'s automatic warning message about
unvacuumed databases will ignore <literal>pg_database</> entries
with <literal>datallowconn</> = <literal>false</>, so as to avoid
@ -404,20 +434,22 @@ VACUUM
</indexterm>
<para>
It's a good idea to save the database server's log output somewhere,
rather than just routing it to <filename>/dev/null</>. The log output
is invaluable when it comes time to diagnose problems. However, the
log output tends to be voluminous (especially at higher debug levels)
and you won't want to save it indefinitely. You need to <quote>rotate</>
the log files so that new log files are started and old ones thrown
away every so often.
It is a good idea to save the database server's log output
somewhere, rather than just routing it to <filename>/dev/null</>.
The log output is invaluable when it comes time to diagnose
problems. However, the log output tends to be voluminous
(especially at higher debug levels) and you won't want to save it
indefinitely. You need to <quote>rotate</> the log files so that
new log files are started and old ones removed after a reasonable
period of time.
</para>
<para>
If you simply direct the <systemitem>stderr</> of the <command>postmaster</command> into a
file, the only way to truncate the log file is to stop and restart
the <command>postmaster</command>. This may be OK for development setups but you won't
want to run a production server that way.
the <command>postmaster</command>. This may be OK if you are using
<productname>PostgreSQL</productname> in a development environment,
but few production servers would find this behavior acceptable.
</para>
<para>
@ -444,14 +476,16 @@ VACUUM
<command>pg_ctl</>, then the <systemitem>stderr</> of the <command>postmaster</command>
is already redirected to <systemitem>stdout</>, so you just need a
pipe command:
<programlisting>
pg_ctl start | logrotate
</programlisting>
The <productname>PostgreSQL</> distribution doesn't include a suitable
log rotation program, but there are many available on the Internet;
one is included in the Apache distribution, for example.
The <productname>PostgreSQL</> distribution doesn't include a
suitable log rotation program, but there are many available on the
Internet. For example, the <application>logrotate</application>
tool included in the <productname>Apache</productname> distribution
can be used with <productname>PostgreSQL</productname>.
</para>
</sect1>
</chapter>

View file

@ -1,5 +1,5 @@
<!--
$PostgreSQL: pgsql/doc/src/sgml/page.sgml,v 1.16 2003/11/29 19:51:37 pgsql Exp $
$PostgreSQL: pgsql/doc/src/sgml/page.sgml,v 1.17 2003/12/14 00:10:32 neilc Exp $
-->
<chapter id="page">
@ -151,7 +151,8 @@ data. Empty in ordinary tables.</entry>
</table>
<para>
All the details may be found in src/include/storage/bufpage.h.
All the details may be found in
<filename>src/include/storage/bufpage.h</filename>.
</para>
<para>
@ -305,7 +306,8 @@ data. Empty in ordinary tables.</entry>
</table>
<para>
All the details may be found in src/include/access/htup.h.
All the details may be found in
<filename>src/include/access/htup.h</filename>.
</para>
<para>

View file

@ -1,5 +1,5 @@
<!--
$PostgreSQL: pgsql/doc/src/sgml/perform.sgml,v 1.38 2003/11/29 19:51:37 pgsql Exp $
$PostgreSQL: pgsql/doc/src/sgml/perform.sgml,v 1.39 2003/12/14 00:10:32 neilc Exp $
-->
<chapter id="performance-tips">
@ -100,7 +100,7 @@ $PostgreSQL: pgsql/doc/src/sgml/perform.sgml,v 1.38 2003/11/29 19:51:37 pgsql Ex
<para>
Here are some examples (using the regression test database after a
<literal>VACUUM ANALYZE</>, and 7.3 development sources):
<command>VACUUM ANALYZE</>, and 7.3 development sources):
<programlisting>
EXPLAIN SELECT * FROM tenk1;

View file

@ -1,5 +1,5 @@
<!--
$PostgreSQL: pgsql/doc/src/sgml/plperl.sgml,v 2.21 2003/11/29 19:51:37 pgsql Exp $
$PostgreSQL: pgsql/doc/src/sgml/plperl.sgml,v 2.22 2003/12/14 00:10:32 neilc Exp $
-->
<chapter id="plperl">
@ -152,7 +152,7 @@ SELECT name, empcomp(employee) FROM employee;
<para>
The argument values supplied to a PL/Perl function's code are
simply the input arguments converted to text form (just as if they
had been displayed by a <literal>SELECT</literal> statement).
had been displayed by a <command>SELECT</command> statement).
Conversely, the <literal>return</> command will accept any string
that is acceptable input format for the function's declared return
type. So, the PL/Perl programmer can manipulate data values as if

View file

@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/protocol.sgml,v 1.48 2003/11/29 19:51:37 pgsql Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/protocol.sgml,v 1.49 2003/12/14 00:10:32 neilc Exp $ -->
<chapter id="protocol">
<title>Frontend/Backend Protocol</title>
@ -200,13 +200,13 @@
<para>
This section describes the message flow and the semantics of each
message type. (Details of the exact representation of each message
appear in <xref linkend="protocol-message-formats">.)
There are several different sub-protocols
depending on the state of the connection: start-up,
query, function call, COPY, and termination. There are also special
provisions for asynchronous operations (including
notification responses and command cancellation),
which can occur at any time after the start-up phase.
appear in <xref linkend="protocol-message-formats">.) There are
several different sub-protocols depending on the state of the
connection: start-up, query, function call,
<command>COPY</command>, and termination. There are also special
provisions for asynchronous operations (including notification
responses and command cancellation), which can occur at any time
after the start-up phase.
</para>
<sect2>
@ -989,15 +989,17 @@
</para>
<para>
ParameterStatus messages will be generated whenever the active value
changes for any of the parameters the backend believes the frontend
should know about. Most commonly this occurs in response to a
<command>SET</> SQL command executed by the frontend, and this case
is effectively synchronous --- but it is also possible for parameter
status changes to occur because the administrator changed a configuration
file and then sent the <systemitem>SIGHUP</systemitem> signal to the postmaster. Also, if a SET command is
rolled back, an appropriate ParameterStatus message will be generated
to report the current effective value.
ParameterStatus messages will be generated whenever the active
value changes for any of the parameters the backend believes the
frontend should know about. Most commonly this occurs in response
to a <command>SET</> SQL command executed by the frontend, and
this case is effectively synchronous --- but it is also possible
for parameter status changes to occur because the administrator
changed a configuration file and then sent the
<systemitem>SIGHUP</systemitem> signal to the postmaster. Also,
if a <command>SET</command> command is rolled back, an appropriate
ParameterStatus message will be generated to report the current
effective value.
</para>
<para>
@ -1129,46 +1131,53 @@
<para>
For either normal or abnormal termination, any open transaction is
rolled back, not committed. One should note however that if a
frontend disconnects while a non-SELECT query is being processed,
the backend will probably finish the query before noticing the
disconnection.
If the query is outside any transaction block (<command>BEGIN</>
... <command>COMMIT</> sequence) then its results may be committed
before the disconnection is recognized.
frontend disconnects while a non-<command>SELECT</command> query
is being processed, the backend will probably finish the query
before noticing the disconnection. If the query is outside any
transaction block (<command>BEGIN</> ... <command>COMMIT</>
sequence) then its results may be committed before the
disconnection is recognized.
</para>
</sect2>
<Sect2>
<Title>SSL Session Encryption</Title>
<Title><acronym>SSL</acronym> Session Encryption</Title>
<Para>
If <productname>PostgreSQL</> was built with SSL support, frontend/backend
communications can be encrypted using SSL. This provides communication
security in environments where attackers might be able to capture the
session traffic.
If <productname>PostgreSQL</> was built with
<acronym>SSL</acronym> support, frontend/backend communications
can be encrypted using <acronym>SSL</acronym>. This provides
communication security in environments where attackers might be
able to capture the session traffic. For more information on
encrypting <productname>PostgreSQL</productname> sessions with
<acronym>SSL</acronym>, see <xref linkend="ssl-tcp">.
</para>
<para>
To initiate an SSL-encrypted connection, the frontend initially sends
an SSLRequest message rather than a StartupMessage. The server then
responds with a single byte containing <literal>S</> or <literal>N</>,
indicating that it is willing or unwilling to perform SSL, respectively.
The frontend may close the connection at this point if it is dissatisfied
with the response. To continue after <literal>S</>, perform an SSL
startup handshake (not described here, part of the SSL specification)
with the server. If this is successful, continue with
sending the usual StartupMessage. In this case the StartupMessage and
all subsequent data will be SSL-encrypted. To continue after
To initiate an <acronym>SSL</acronym>-encrypted connection, the
frontend initially sends an SSLRequest message rather than a
StartupMessage. The server then responds with a single byte
containing <literal>S</> or <literal>N</>, indicating that it is
willing or unwilling to perform <acronym>SSL</acronym>,
respectively. The frontend may close the connection at this point
if it is dissatisfied with the response. To continue after
<literal>S</>, perform an <acronym>SSL</acronym> startup handshake
(not described here, part of the <acronym>SSL</acronym>
specification) with the server. If this is successful, continue
with sending the usual StartupMessage. In this case the
StartupMessage and all subsequent data will be
<acronym>SSL</acronym>-encrypted. To continue after
<literal>N</>, send the usual StartupMessage and proceed without
encryption.
</para>
<para>
The frontend should also be prepared to handle an ErrorMessage response
to SSLRequest from the server. This would only occur if the server
predates the addition of SSL support to <productname>PostgreSQL</>.
In this case the connection must be closed, but the frontend may choose
to open a fresh connection and proceed without requesting SSL.
The frontend should also be prepared to handle an ErrorMessage
response to SSLRequest from the server. This would only occur if
the server predates the addition of <acronym>SSL</acronym> support
to <productname>PostgreSQL</>. In this case the connection must
be closed, but the frontend may choose to open a fresh connection
and proceed without requesting <acronym>SSL</acronym>.
</para>
<para>
@ -1178,8 +1187,9 @@
<para>
While the protocol itself does not provide a way for the server to
force SSL encryption, the administrator may configure the server to
reject unencrypted sessions as a byproduct of authentication checking.
force <acronym>SSL</acronym> encryption, the administrator may
configure the server to reject unencrypted sessions as a byproduct
of authentication checking.
</para>
</sect2>
</sect1>
@ -2106,7 +2116,7 @@ CopyData (F &amp; B)
</Term>
<ListItem>
<Para>
Identifies the message as COPY data.
Identifies the message as <command>COPY</command> data.
</Para>
</ListItem>
</VarListEntry>
@ -2153,7 +2163,7 @@ CopyDone (F &amp; B)
</Term>
<ListItem>
<Para>
Identifies the message as a COPY-complete indicator.
Identifies the message as a <command>COPY</command>-complete indicator.
</Para>
</ListItem>
</VarListEntry>
@ -2188,7 +2198,7 @@ CopyFail (F)
</Term>
<ListItem>
<Para>
Identifies the message as a COPY-failure indicator.
Identifies the message as a <command>COPY</command>-failure indicator.
</Para>
</ListItem>
</VarListEntry>
@ -2255,7 +2265,7 @@ CopyInResponse (B)
</Term>
<ListItem>
<Para>
0 indicates the overall copy format is textual (rows
0 indicates the overall <command>COPY</command> format is textual (rows
separated by newlines, columns separated by separator
characters, etc).
1 indicates the overall copy format is binary (similar
@ -2330,13 +2340,12 @@ CopyOutResponse (B)
</Term>
<ListItem>
<Para>
0 indicates the overall copy format is textual (rows
separated by newlines, columns separated by separator
characters, etc).
1 indicates the overall copy format is binary (similar
to DataRow format).
See <xref linkend="sql-copy" endterm="sql-copy-title">
for more information.
0 indicates the overall <command>COPY</command> format
is textual (rows separated by newlines, columns
separated by separator characters, etc). 1 indicates
the overall copy format is binary (similar to DataRow
format). See <xref linkend="sql-copy"
endterm="sql-copy-title"> for more information.
</Para>
</ListItem>
</VarListEntry>
@ -3602,7 +3611,7 @@ SSLRequest (F)
</Term>
<ListItem>
<Para>
The SSL request code. The value is chosen to contain
The <acronym>SSL</acronym> request code. The value is chosen to contain
<literal>1234</> in the most significant 16 bits, and <literal>5679</> in the
least 16 significant bits. (To avoid confusion, this code
must not be the same as any protocol version number.)
@ -3899,8 +3908,9 @@ message.
<ListItem>
<Para>
Where: an indication of the context in which the error occurred.
Presently this includes a call stack traceback of active PL functions.
The trace is one entry per line, most recent first.
Presently this includes a call stack traceback of active
procedural language functions. The trace is one entry per line,
most recent first.
</Para>
</ListItem>
</VarListEntry>
@ -4006,12 +4016,12 @@ may allow improvements in performance or functionality.
</para>
<para>
COPY data is now encapsulated into CopyData and CopyDone messages. There
is a well-defined way to recover from errors during COPY. The special
<command>COPY</command> data is now encapsulated into CopyData and CopyDone messages. There
is a well-defined way to recover from errors during <command>COPY</command>. The special
<quote><literal>\.</></quote> last line is not needed anymore, and is not sent
during COPY OUT.
(It is still recognized as a terminator during COPY IN, but its use is
deprecated and will eventually be removed.) Binary COPY is supported.
during <command>COPY OUT</command>.
(It is still recognized as a terminator during <command>COPY IN</command>, but its use is
deprecated and will eventually be removed.) Binary <command>COPY</command> is supported.
The CopyInResponse and CopyOutResponse messages include fields indicating
the number of columns and the format of each column.
</para>
@ -4046,7 +4056,7 @@ the backend.
<para>
The NotificationResponse ('<literal>A</>') message has an additional string
field, which is presently empty but may someday carry additional data passed
from the NOTIFY event sender.
from the <command>NOTIFY</command> event sender.
</para>
<para>

View file

@ -1,5 +1,5 @@
<!--
$PostgreSQL: pgsql/doc/src/sgml/typeconv.sgml,v 1.41 2003/12/02 00:26:59 tgl Exp $
$PostgreSQL: pgsql/doc/src/sgml/typeconv.sgml,v 1.42 2003/12/14 00:10:32 neilc Exp $
-->
<chapter Id="typeconv">
@ -122,9 +122,9 @@ with, and perhaps converted to, the types of the target columns.
</term>
<listitem>
<para>
Since all query results from a unionized <literal>SELECT</literal> statement
Since all query results from a unionized <command>SELECT</command> statement
must appear in a single set of columns, the types of the results of each
<literal>SELECT</> clause must be matched up and converted to a uniform set.
<command>SELECT</> clause must be matched up and converted to a uniform set.
Similarly, the branch expressions of a <literal>CASE</> construct must be
converted to a common type so that the <literal>CASE</> expression as a whole
has a known output type. The same holds for <literal>ARRAY</> constructs.