Apache Module mod_proxy
Summary
Warning
Do not enable proxying with ProxyRequests
until you have secured your server. Open proxy servers are dangerous both to your
network and to the Internet at large.
This module implements a proxy/gateway for Apache. It implements
proxying capability for FTP
, CONNECT
(for SSL),
HTTP/0.9
, HTTP/1.0
, and HTTP/1.1
.
The module can be configured to connect to other proxy modules for these
and other protocols.
Apache's proxy features are divided into several modules in
addition to mod_proxy
:
mod_proxy_http
, mod_proxy_ftp
and mod_proxy_connect
. Thus, if you want to use
one or more of the particular proxy functions, load
mod_proxy
and the appropriate module(s)
into the server (either statically at compile-time or dynamically
via the LoadModule
directive).
In addition, extended features are provided by other modules.
Caching is provided by mod_cache
and related
modules. The ability to contact remote servers using the SSL/TLS
protocol is provided by the SSLProxy*
directives of
mod_ssl
. These additional modules will need
to be loaded and configured to take advantage of these features.
Directives
Topics
See also
Apache can be configured in both a forward and
reverse proxy mode.
An ordinary forward proxy is an intermediate
server that sits between the client and the origin
server. In order to get content from the origin server,
the client sends a request to the proxy naming the origin server
as the target and the proxy then requests the content from the
origin server and returns it to the client. The client must be
specially configured to use the forward proxy to access other
sites.
A typical usage of a forward proxy is to provide Internet
access to internal clients that are otherwise restricted by a
firewall. The forward proxy can also use caching (as provided
by mod_cache
) to reduce network usage.
The forward proxy is activated using the ProxyRequests
directive. Because
forward proxys allow clients to access arbitrary sites through
your server and to hide their true origin, it is essential that
you secure your server so that only
authorized clients can access the proxy before activating a
forward proxy.
A reverse proxy, by contrast, appears to the
client just like an ordinary web server. No special
configuration on the client is necessary. The client makes
ordinary requests for content in the name-space of the reverse
proxy. The reverse proxy then decides where to send those
requests, and returns the content as if it was itself the
origin.
A typical usage of a reverse proxy is to provide Internet
users access to a server that is behind a firewall. Reverse
proxies can also be used to balance load among several back-end
servers, or to provide caching for a slower back-end server.
In addition, reverse proxies can be used simply to bring
several servers into the same URL space.
A reverse proxy is activated using the ProxyPass
directive or the
[P]
flag to the RewriteRule
directive. It is
not necessary to turn ProxyRequests
on in order to
configure a reverse proxy.
The examples below are only a very basic idea to help you
get started. Please read the documentation on the individual
directives.
In addition, if you wish to have caching enabled, consult
the documentation from mod_cache
.
Forward Proxy
ProxyRequests On
ProxyVia On
<Proxy *>
Order deny,allow
Deny from all
Allow from internal.example.com
</Proxy>
Reverse Proxy
ProxyRequests Off
<Proxy *>
Order deny,allow
Allow from all
</Proxy>
ProxyPass /foo http://foo.example.com/bar
ProxyPassReverse /foo http://foo.example.com/bar
You can control who can access your proxy via the <Proxy>
control block as in
the following example:
<Proxy *>
Order Deny,Allow
Deny from all
Allow from 192.168.0
</Proxy>
For more information on access control directives, see
mod_access
.
Strictly limiting access is essential if you are using a
forward proxy (using the ProxyRequests
directive).
Otherwise, your server can be used by any client to access
arbitrary hosts while hiding his or her true identity. This is
dangerous both for your network and for the Internet at large.
When using a reverse proxy (using the ProxyPass
directive with
ProxyRequests Off
), access control is less
critical because clients can only contact the hosts that you
have specifically configured.
You probably don't have that particular file type defined as
application/octet-stream
in your proxy's mime.types
configuration file. A useful line can be
application/octet-stream bin dms lha lzh exe class tgz taz
In the rare situation where you must download a specific file using the
FTP ASCII
transfer method (while the default transfer is in
binary
mode), you can override mod_proxy
's
default by suffixing the request with ;type=a
to force an
ASCII transfer. (FTP Directory listings are always executed in ASCII mode,
however.)
An FTP URI is interpreted relative to the home directory of the user
who is logging in. Alas, to reach higher directory levels you cannot
use /../, as the dots are interpreted by the browser and not actually
sent to the FTP server. To address this problem, the so called Squid
%2f hack was implemented in the Apache FTP proxy; it is a
solution which is also used by other popular proxy servers like the Squid Proxy Cache. By
prepending /%2f
to the path of your request, you can make
such a proxy change the FTP starting directory to /
(instead
of the home directory). For example, to retrieve the file
/etc/motd
, you would use the URL:
ftp://user@host/%2f/etc/motd
To log in to an FTP server by username and password, Apache uses
different strategies. In absense of a user name and password in the URL
altogether, Apache sends an anonymous login to the FTP server,
i.e.,
user: anonymous
password: apache_proxy@
This works for all popular FTP servers which are configured for
anonymous access.
For a personal login with a specific username, you can embed the user
name into the URL, like in:
ftp://username@host/myfile
If the FTP server asks for a password when given this username (which
it should), then Apache will reply with a 401
(Authorization
required) response, which causes the Browser to pop up the
username/password dialog. Upon entering the password, the connection
attempt is retried, and if successful, the requested resource is
presented. The advantage of this procedure is that your browser does not
display the password in cleartext (which it would if you had used
ftp://username:password@host/myfile
in the first place).
Note
The password which is transmitted in such a way is not encrypted on
its way. It travels between your browser and the Apache proxy server in
a base64-encoded cleartext string, and between the Apache proxy and the
FTP server as plaintext. You should therefore think twice before
accessing your FTP server via HTTP (or before accessing your personal
files via FTP at all!) When using unsecure channels, an eavesdropper
might intercept your password on its way.
If you're using the ProxyBlock
directive, hostnames' IP addresses are looked up
and cached during startup for later match test. This may take a few
seconds (or more) depending on the speed with which the hostname lookups
occur.
An Apache proxy server situated in an intranet needs to forward
external requests through the company's firewall (for this, configure
the ProxyRemote
directive
to forward the respective scheme to the firewall proxy).
However, when it has to
access resources within the intranet, it can bypass the firewall when
accessing hosts. The NoProxy
directive is useful for specifying which hosts belong to the intranet and
should be accessed directly.
Users within an intranet tend to omit the local domain name from their
WWW requests, thus requesting "http://somehost/" instead of
http://somehost.example.com/
. Some commercial proxy servers
let them get away with this and simply serve the request, implying a
configured local domain. When the ProxyDomain
directive is used and the server is configured for proxy service, Apache can return
a redirect response and send the client to the correct, fully qualified,
server address. This is the preferred method since the user's bookmark
files will then contain fully qualified hosts.
For circumstances where you have a application server which doesn't
implement keepalives or HTTP/1.1 properly, there are 2 environment
variables which when set send a HTTP/1.0 with no keepalive. These are set
via the SetEnv
directive.
These are the force-proxy-request-1.0
and
proxy-nokeepalive
notes.
<Location /buggyappserver/>
ProxyPass http://buggyappserver:7001/foo/
SetEnv force-proxy-request-1.0 1
SetEnv proxy-nokeepalive 1
</Location>
The AllowCONNECT
directive specifies a list
of port numbers to which the proxy CONNECT
method may
connect. Today's browsers use this method when a https
connection is requested and proxy tunneling over HTTP is in effect.
By default, only the default https port (443
) and the
default snews port (563
) are enabled. Use the
AllowCONNECT
directive to override this default and
allow connections to the listed ports only.
Note that you'll need to have mod_proxy_connect
present
in the server in order to get the support for the CONNECT
at
all.
This directive is only useful for Apache proxy servers within
intranets. The NoProxy
directive specifies a
list of subnets, IP addresses, hosts and/or domains, separated by
spaces. A request to a host which matches one or more of these is
always served directly, without forwarding to the configured
ProxyRemote
proxy server(s).
Example
ProxyRemote * http://firewall.mycompany.com:81
NoProxy .mycompany.com 192.168.112.0/21
The host arguments to the NoProxy
directive are one of the following type list:
- Domain
-
A Domain is a partially qualified DNS domain name, preceded
by a period. It represents a list of hosts which logically belong to the
same DNS domain or zone (i.e., the suffixes of the hostnames are
all ending in Domain).
Examples
.com .apache.org.
To distinguish Domains from Hostnames (both syntactically and semantically; a DNS domain can
have a DNS A record, too!), Domains are always written with a
leading period.
Note
Domain name comparisons are done without regard to the case, and
Domains are always assumed to be anchored in the root of the
DNS tree, therefore two domains .MyDomain.com
and
.mydomain.com.
(note the trailing period) are considered
equal. Since a domain comparison does not involve a DNS lookup, it is much
more efficient than subnet comparison.
- SubNet
-
A SubNet is a partially qualified internet address in
numeric (dotted quad) form, optionally followed by a slash and the netmask,
specified as the number of significant bits in the SubNet. It is
used to represent a subnet of hosts which can be reached over a common
network interface. In the absence of the explicit net mask it is assumed
that omitted (or zero valued) trailing digits specify the mask. (In this
case, the netmask can only be multiples of 8 bits wide.) Examples:
192.168
or 192.168.0.0
- the subnet 192.168.0.0 with an implied netmask of 16 valid bits
(sometimes used in the netmask form
255.255.0.0
)
192.168.112.0/21
- the subnet
192.168.112.0/21
with a netmask of 21
valid bits (also used in the form 255.255.248.0)
As a degenerate case, a SubNet with 32 valid bits is the
equivalent to an IPAddr, while a SubNet with zero
valid bits (e.g., 0.0.0.0/0) is the same as the constant
_Default_, matching any IP address.
- IPAddr
-
A IPAddr represents a fully qualified internet address in
numeric (dotted quad) form. Usually, this address represents a host, but
there need not necessarily be a DNS domain name connected with the
address.
Note
An IPAddr does not need to be resolved by the DNS system, so
it can result in more effective apache performance.
- Hostname
-
A Hostname is a fully qualified DNS domain name which can
be resolved to one or more IPAddrs via the
DNS domain name service. It represents a logical host (in contrast to
Domains, see above) and must be resolvable
to at least one IPAddr (or often to a list
of hosts with different IPAddrs).
Examples
prep.ai.mit.edu
www.apache.org
Note
In many situations, it is more effective to specify an IPAddr in place of a Hostname since a
DNS lookup can be avoided. Name resolution in Apache can take a remarkable
deal of time when the connection to the name server uses a slow PPP
link.
Hostname comparisons are done without regard to the case,
and Hostnames are always assumed to be anchored in the root
of the DNS tree, therefore two hosts WWW.MyDomain.com
and www.mydomain.com.
(note the trailing period) are
considered equal.
See also
Directives placed in <Proxy>
sections apply only to matching proxied content. Shell-style wildcards are
allowed.
For example, the following will allow only hosts in
yournetwork.example.com
to access content via your proxy
server:
<Proxy *>
Order Deny,Allow
Deny from all
Allow from yournetwork.example.com
</Proxy>
The following example will process all files in the foo
directory of example.com
through the INCLUDES
filter when they are sent through the proxy server:
<Proxy http://example.com/foo/*>
SetOutputFilter INCLUDES
</Proxy>
The ProxyBadHeader
directive determines the
behaviour of mod_proxy
if it receives syntactically invalid
header lines (i.e. containing no colon). The following arguments
are possible:
IsError
- Abort the request and end up with a 502 (Bad Gateway) response. This is
the default behaviour.
Ignore
- Treat bad header lines as if they weren't sent.
StartBody
- When receiving the first bad header line, finish reading the headers and
treat the remainder as body. This helps to work around buggy backend servers
which forget to insert an empty line between the headers and the body.
The ProxyBlock
directive specifies a list of
words, hosts and/or domains, separated by spaces. HTTP, HTTPS, and
FTP document requests to sites whose names contain matched words,
hosts or domains are blocked by the proxy server. The proxy
module will also attempt to determine IP addresses of list items which
may be hostnames during startup, and cache them for match test as
well. That may slow down the startup time of the server.
Example
ProxyBlock joes-garage.com some-host.co.uk rocky.wotsamattau.edu
rocky.wotsamattau.edu
would also be matched if referenced by
IP address.
Note that wotsamattau
would also be sufficient to match
wotsamattau.edu
.
Note also that
blocks connections to all sites.
This directive is only useful for Apache proxy servers within
intranets. The ProxyDomain
directive specifies
the default domain which the apache proxy server will belong to. If a
request to a host without a domain name is encountered, a redirection
response to the same host with the configured Domain appended
will be generated.
Example
ProxyRemote * http://firewall.mycompany.com:81
NoProxy .mycompany.com 192.168.112.0/21
ProxyDomain .mycompany.com
This directive is useful for reverse-proxy setups, where you want to
have a common look and feel on the error pages seen by the end user.
This also allows for included files (via mod_include's SSI) to get
the error code and act accordingly (default behavior would display
the error page of the proxied server, turning this on shows the SSI
Error message).
The ProxyIOBufferSize
directive adjusts the size
of the internal buffer, which is used as a scratchpad for the data between
input and output. The size must be less or equal 8192
.
In almost every case there's no reason to change that value.
Description: | Container for directives applied to regular-expression-matched
proxied resources |
Syntax: | <ProxyMatch regex> ...</ProxyMatch> |
Context: | server config, virtual host |
Status: | Extension |
Module: | mod_proxy |
The <ProxyMatch>
directive is
identical to the <Proxy>
directive, except it matches URLs
using regular expressions.
The ProxyMaxForwards
directive specifies the
maximum number of proxies through which a request may pass, if there's no
Max-Forwards
header supplied with the request. This is
set to prevent infinite proxy loops, or a DoS attack.
Example
ProxyMaxForwards 15
This directive allows remote servers to be mapped into the space of
the local server; the local server does not act as a proxy in the
conventional sense, but appears to be a mirror of the remote
server. path is the name of a local virtual path; url
is a partial URL for the remote server and cannot include a query
string.
Suppose the local server has address http://example.com/
;
then
ProxyPass /mirror/foo/ http://backend.example.com/
will cause a local request for
http://example.com/mirror/foo/bar
to be internally converted
into a proxy request to http://backend.example.com/bar
.
The !
directive is useful in situations where you don't want
to reverse-proxy a subdirectory, e.g.
ProxyPass /mirror/foo/i !
ProxyPass /mirror/foo http://backend.example.com
will proxy all requests to /mirror/foo
to
backend.example.com
except requests made to
/mirror/foo/i
.
Note
Order is important. you need to put the exclusions before the
general proxypass directive.
When used inside a <Location>
section, the first argument is omitted and the local
directory is obtained from the <Location>
.
The
ProxyRequests
directive should
usually be set
off when using
ProxyPass
.
If you require a more flexible reverse-proxy configuration, see the
RewriteRule
directive with the
[P]
flag.
This directive lets Apache adjust the URL in the Location
,
Content-Location
and URI
headers on HTTP redirect
responses. This is essential when Apache is used as a reverse proxy to avoid
by-passing the reverse proxy because of HTTP redirects on the backend
servers which stay behind the reverse proxy.
Only the HTTP response headers specifically mentioned above
will be rewritten. Apache will not rewrite other response
headers, nor will it rewrite URL references inside HTML pages.
This means that if the proxied content contains absolute URL
references, they will by-pass the proxy. A third-party module
that will look inside the HTML and rewrite URL references is Nick
Kew's mod_proxy_html.
path is the name of a local virtual path. url is a
partial URL for the remote server - the same way they are used for the
ProxyPass
directive.
For example, suppose the local server has address
http://example.com/
; then
ProxyPass /mirror/foo/ http://backend.example.com/
ProxyPassReverse /mirror/foo/ http://backend.example.com/
will not only cause a local request for the
http://example.com/mirror/foo/bar
to be internally converted
into a proxy request to http://backend.example.com/bar
(the functionality ProxyPass
provides here). It also takes care
of redirects the server backend.example.com
sends: when
http://backend.example.com/bar
is redirected by him to
http://backend.example.com/quux
Apache adjusts this to
http://example.com/mirror/foo/quux
before forwarding the HTTP
redirect response to the client. Note that the hostname used for
constructing the URL is chosen in respect to the setting of the UseCanonicalName
directive.
Note that this ProxyPassReverse
directive can
also be used in conjunction with the proxy pass-through feature
(RewriteRule ... [P]
) from mod_rewrite
because its doesn't depend on a corresponding ProxyPass
directive.
When used inside a <Location>
section, the first argument is omitted and the local
directory is obtained from the <Location>
.
When enabled, this option will pass the Host: line from the incoming
request to the proxied host, instead of the hostname specified in the
proxypass line.
This option should normally be turned Off
. It is mostly
useful in special configurations like proxied mass name-based virtual
hosting, where the original Host header needs to be evaluated by the
backend server.
The ProxyReceiveBufferSize
directive specifies an
explicit (TCP/IP) network buffer size for proxied HTTP and FTP connections,
for increased throughput. It has to be greater than 512
or set
to 0
to indicate that the system's default buffer size should
be used.
Example
ProxyReceiveBufferSize 2048
This defines remote proxies to this proxy. match is either the
name of a URL-scheme that the remote server supports, or a partial URL
for which the remote server should be used, or *
to indicate
the server should be contacted for all requests. remote-server is
a partial URL for the remote server. Syntax:
remote-server =
scheme://hostname[:port]
scheme is effectively the protocol that should be used to
communicate with the remote server; only http
is supported by
this module.
Example
ProxyRemote http://goodguys.com/ http://mirrorguys.com:8000
ProxyRemote * http://cleversite.com
ProxyRemote ftp http://ftpproxy.mydomain.com:8080
In the last example, the proxy will forward FTP requests, encapsulated
as yet another HTTP proxy request, to another proxy which can handle
them.
This option also supports reverse proxy configuration - a backend
webserver can be embedded within a virtualhost URL space even if that
server is hidden by another forward proxy.
The ProxyRemoteMatch
is identical to the
ProxyRemote
directive, except the
first argument is a regular expression match against the requested URL.
This allows or prevents Apache from functioning as a forward proxy
server. (Setting ProxyRequests to Off
does not disable use of
the ProxyPass
directive.)
In a typical reverse proxy configuration, this option should be set to
Off
.
In order to get the functionality of proxying HTTP or FTP sites, you
need also mod_proxy_http
or mod_proxy_ftp
(or both) present in the server.
Warning
Do not enable proxying with ProxyRequests
until you have secured your server. Open proxy servers are dangerous
both to your network and to the Internet at large.
This directive allows a user to specifiy a timeout on proxy requests.
This is useful when you have a slow/buggy appserver which hangs, and you
would rather just return a timeout and fail gracefully instead of waiting
however long it takes the server to return.
This directive controls the use of the Via:
HTTP
header by the proxy. Its intended use is to control the flow of of
proxy requests along a chain of proxy servers. See RFC 2616 (HTTP/1.1), section
14.45 for an explanation of Via:
header lines.
- If set to
Off
, which is the default, no special processing
is performed. If a request or reply contains a Via:
header,
it is passed through unchanged.
- If set to
On
, each request and reply will get a
Via:
header line added for the current host.
- If set to
Full
, each generated Via:
header
line will additionally have the Apache server version shown as a
Via:
comment field.
- If set to
Block
, every proxy request will have all its
Via:
header lines removed. No new Via:
header will
be generated.