The planner/optimizer decides which plans should be generated
based upon the types of indexes defined on the relations appearing in
a query. There is always the possibility of performing a
sequential scan on a relation, so a plan using only
sequential scans is always created. Assume an index is defined on a
relation (for example a B-tree index) and a query contains the
restriction
relation.attribute OPR constant. If
relation.attribute happens to match the key of the B-tree
index and OPR is anything but '<>' another plan is created using
the B-tree index to scan the relation. If there are further indexes
present and the restrictions in the query happen to match a key of an
index further plans will be considered.
After all feasible plans have been found for scanning single
relations, plans for joining relations are created. The
planner/optimizer considers only joins between every two relations for
which there exists a corresponding join clause (i.e. for which a
restriction like where rel1.attr1=rel2.attr2 exists) in the
where qualification. All possible plans are generated for every
join pair considered by the planner/optimizer. The three possible join
strategies are:
nested iteration join: The right relation is scanned
once for every tuple found in the left relation. This strategy
is easy to implement but can be very time consuming.
merge sort join: Each relation is sorted on the join
attributes before the join starts. Then the two relations are
merged together taking into account that both relations are
ordered on the join attributes. This kind of join is more
attractive because every relation has to be scanned only once.
hash join: the right relation is first hashed on its
join attributes. Next the left relation is scanned and the
appropriate values of every tuple found are used as hash keys to
locate the tuples in the right relation.
Here we will give a little description of the nodes appearing in the
plan. Figure \ref{plan} shows the plan produced for the query in
example \ref{simple_select}.
The top node of the plan is a MergeJoin node that has two
successors, one attached to the field lefttree and the second
attached to the field righttree. Each of the subnodes represents
one relation of the join. As mentioned above a merge sort
join requires each relation to be sorted. That's why we find
a Sort node in each subplan. The additional qualification given
in the query (s.sno > 2) is pushed down as far as possible and is
attached to the qpqual field of the leaf SeqScan node of
the corresponding subplan.
The list attached to the field mergeclauses of the
MergeJoin node contains information about the join attributes.
The values 65000 and 65001
for the varno fields in the
VAR nodes appearing in the mergeclauses list (and also in the
targetlist) mean that not the tuples of the current node should be
considered but the tuples of the next "deeper" nodes (i.e. the top
nodes of the subplans) should be used instead.
Note that every Sort and SeqScan node appearing in figure
\ref{plan} has got a targetlist but because there was not enough space
only the one for the MergeJoin node could be drawn.
Another task performed by the planner/optimizer is fixing the
operator ids in the Expr
and Oper nodes. As
mentioned earlier, PostgreSQL supports a variety of different data
types and even user defined types can be used. To be able to maintain
the huge amount of functions and operators it is necessary to store
them in a system table. Each function and operator gets a unique
operator id. According to the types of the attributes used
within the qualifications etc., the appropriate operator ids
have to be used.