WAL is automatically enabled from release 7.1
onwards. No action is required from the administrator with the
exception of ensuring that the additional disk-space requirements
of the WAL logs are met, and that any necessary
tuning is done (see Section 12.3).
WAL logs are stored in the directory
a set of segment files, each 16 MB in size. Each segment is
divided into 8 kB pages. The log record headers are described in
access/xlog.h; record content is dependent on
the type of event that is being logged. Segment files are given
ever-increasing numbers as names, starting at
0000000000000000. The numbers do not wrap, at
present, but it should take a very long time to exhaust the
available stock of numbers.
The WAL buffers and control structure are in
shared memory, and are handled by the backends; they are protected
by lightweight locks. The demand on shared memory is dependent on the
number of buffers. The default size of the WAL
buffers is 8 buffers of 8 kB each, or 64 kB total.
It is of advantage if the log is located on another disk than the
main database files. This may be achieved by moving the directory,
pg_xlog, to another location (while the
postmaster is shut down, of course) and creating a symbolic link
from the original location in $PGDATA to
the new location.
The aim of WAL, to ensure that the log is
written before database records are altered, may be subverted by
disk drives that falsely report a successful write to the kernel,
when, in fact, they have only cached the data and not yet stored it
on the disk. A power failure in such a situation may still lead to
irrecoverable data corruption. Administrators should try to ensure
that disks holding PostgreSQL's
log files do not make such false reports.
After a checkpoint has been made and the log flushed, the
checkpoint's position is saved in the file
pg_control. Therefore, when recovery is to be
done, the backend first reads pg_control and
then the checkpoint record; then it performs the REDO operation by
scanning forward from the log position indicated in the checkpoint
Because the entire content of data pages is saved in the log on the
first page modification after a checkpoint, all pages changed since
the checkpoint will be restored to a consistent state.
Using pg_control to get the checkpoint
position speeds up the recovery process, but to handle possible
corruption of pg_control, we should actually
implement the reading of existing log segments in reverse order --
newest to oldest -- in order to find the last checkpoint. This has
not been implemented, yet.