Tags

, , , , , , , , , ,

This topic is the question raised in me when I asked to configure apache web server in front of my tomcat server. I know tomcat itself acts as a web server. Then why we need a separate web server?. This question made me to find the answer and post it here.

Why web server

When we go for an enterprise application, the first question will be what does the web server and application server. Why don’t you receive requests to your app server directly instead of passing it through web server?. Why can’t you install external load balancers to app server and make your application works without web server? When tomcat itself acts as a web server, why do we go with web servers like apache?.

Here we see the need of web servers in detail. When someone is specialist in some job, then we can let that job to the specialist right? It is about the importance of protecting the entrance to your app. It is about the choosing the right configuration for the app.

The basic definition that you will read online is that ‘the primary function of a web server is to deliver web pages on the request to clients’. That’s all well and good, but how is that different than any application server you use? How is that different than Tomcat?

Think of an application server as an ecosystem of many different pieces and parts all used for the functioning of an application: an EJB component, a JMS component, a web container, etc. In addition, the application server provides an API to the developer for interaction with each piece and part. The web server on the other hand can either be the bouncer that opens the door to these pieces and parts or a component itself.

What tomcat really does

Tomcat as an application server, but what it actually is a servlet container that delivers dynamic content in the form of JSP pages. To be more specific, Catalina is actually the servlet container, Coyote is the web-server-esque connector which forwards the requests onto Catalina. Web pages are then delivered dynamically through Jasper, the JSP engine. So, in summary Tomcat is the sum of Catalina, Coyote, and Jasper.

So, Tomcat itself can serve as a web server. Its Coyote component handles the requests, forwards them onto Catalina, which then allows Jasper to serve up a JSP. When tomcat itself delivering web pages on requests to clients, why do we need to use web servers?.

Apache Web Server

Because while Tomcat may be OK with handling your requests and delivering up the pages, a proper front-facing web server such as Apache can do so much more for you. It allows you to apply a myriad of different features to the functioning of your application.

Apache is a popular open-source web server and has been the most popular open source web server on the internet for the last 15 years. By understanding of Apache the line between application server and web server, we can define much better about web server.

Now we see 4 important things which really matters for an application, ideally apache web server is mean for it.

1. AJP Connector (connect web server with tomcat app server)

2. Virtual Hosts

3. URL Rewriting

4. Load balancer.

mod_jk and the AJP Connector

A great deal of Apache’s functionality is delivered in the form of modules, which are basically plugins to the main Apache architecture which extend and enhance the main Apache functionality. These modules can be downloaded individually (or built from MakeFiles) and installed into an existing architecture or baked in at installation time.

Even though Tomcat can essentially function as a standalone web server, Apache can function as a much better one. However, in an effort, to keep it all in the family, Apache has a module, mod_jk, which allows a smooth integration with a Tomcat container. Basically, it allows Apache to accept requests, then forward the requests on to the appropriate Tomcat instance, essentially somewhat mimicking the job of Coyote (tomcat web-server-esque connector). The main advantage to using Apache however is it will allow you to run multiple Tomcat instances behind a single instance of Apache.

The basic steps to configure this are:

1. Define an AJP connector port on each Tomcat instance which will accept requests from Apache.

2. Define a worker which ensures requests through Apache are routed to the proper Tomcat instance.

3. Configure the Apache configuration file to invoke the correct workers at the correct time.

Let’s focus on the last two since they are Apache-specific. To define a worker, you create a properties file called workers.properties and inside, define your worker using basic key-value pairs:

worker.list=myworker
worker.myworker.type=ajp13
worker.myworker.host=localhost
worker.myworker.port=8009

In other words: define a worker named myworker, which uses the AJP 1.3 protocol and forwards requests to port 8009, which is where my Tomcat instance is listening for requests from Apache.

Then, in your Apache configuration file, you hook in the worker:
JkWorkersFile /path/to/workers.properties
JkMount /* myworker
Which essentially says, ‘mount all requests to any path (/*) to the worker ‘myworker’ defined in workers.properties.

The result is a perfectly integrated Apache and Tomcat instance. Also, remember multiple instances can be defined. An important note to remember is that the connection between Apache and Tomcat over AJP is not secure, so if you are operating over HTTPS from the browser to Apache, it is important to remember the subsequent transmission from Apache to Tomcat is insecure. This is not necessarily a huge problem since this transmission is most likely within your internal network, but nevertheless something to keep in mind.

VirtualHosts

Another helpful feature of Apache is that it allows you to create virtual hosts on your server so that you can give the appearance of many different hosts all operating on the same IP address. Let’s take a real-world example to illustrate the point.

Let’s assume your project is a site that allows your users to do online shopping. All the code containing your precious algorithms is going to be hosted by a third-party hosting provider, which has, say, the IP address: 112.145.3.2. You also own the domain name, tradersshop.com which you bought sometime ago. In addition, you want to allow your users to create JIRA tickets as they notice problems and bugs on your site. The URL for this will be issues.tradersshop.com.

With Apache, you can create VirtualHosts on your main server that will accept requests from designated servers and route them to the correct ports. These VirtualHosts can be defined in the configuration file, which is basically where most of the configuration for Apache is done. Here is an example of how these VirtualHost definitions would look in the configuration file:

NameVirtualHost *:80
<VirtualHost *:80>
ServerName http://www.tradersshop.com
JkMount /* myworker
</VirtualHost>
<VirtualHost *:80>
ServerName issues.tradersshop.com
JkMount /* myjiraworker
</VirtualHost>

Note that what we’ have done here is combined the use of mod_jk with our VirtualHosts. Since JIRA comes prepackaged with its own Tomcat instance, we will be forwarding requests to multiple servlet containers. As long as each of the above ServerName values are DNS-mapped to our hosted server IP address, these VirtualHosts above will route requests to the appropriate container.

URL Rewriting (mod_rewrite)

One interesting feature of Apache is the ability to rewrite URLs based on regex patterns to achieve certain perceived navigation. One example, which can be extremely useful when securing your application, is the ability to rewrite specific URLs to use HTTPS based on their relative path.

For example, suppose you would like the Contact Us form of your tradersshop(sample project) to be served through HTTPS, but you want the rest of the site to use plain old HTTP. With mod_rewrite, you would add the following to your configuration file:

RewriteEngine on
RewriteCond %{REQUEST_URI} ^/login/.*
RewriteRule ^.*$ https://%{SERVER_NAME}%{REQUEST_URI} [L]

The above can be said in plain English means the following:
Turn the rewrite engine on. Iff the request URI matches the pattern /login/ Then Rewrite the URL to use HTTPS using this exact server name and request URI.

REQUEST_URI and SERVER_NAME are predefined variables in mod_rewrite land which allow you to act on values at runtime. The [L] in brackets at the end of the third line means ‘this is the last rule’. In addition, you can string multiple rewrite conditions together. The condition of ‘AND’ is implicit, while the use of ‘OR’ can be used by appending [OR] to the end of your condition.

All in all, it’s a very easy way to use HTTPS matching only on certain relative paths. This is a much easier way than say the use of security-constraints in your deployment descriptor.

Load Balancing

Perhaps the single greatest use of a web server is the ability to load balance traffic in a cluster. Apache makes this easy through the use of two modules, mod_proxy and mod_proxy_balancer. Load balancing allows Apache to act as your bouncer, dividing traffic evenly among all members of your cluster. You have your choice of three different algorithms for configuring how loads are balanced:

1. Request Counting

This allows you to configure all the members of your cluster to receive their fair share of work based on the total number of requests this member should handle. So, a cluster configured as:

server 1, work effort factor = 10
server 2, work effort factor = 20
server 3, work effort factor = 10
server 4, work effort factor = 10

would result in Server2 handling twice as many requests as any other member of the cluster. Note that these values are relative, so the above is the same as: server2 handles 2 requests where others are 1.

2. Weighted Traffic

Weighted Traffic balancing works basically the same as Request Counting except you specify a factor representing the relative SIZE of the traffic that each node will handle in byes. For example:

server 1, work effort factor = 10
server 2, work effort factor = 20
server 3, work effort factor = 10
server 4, work effort factor = 10

This means that we want Server2 to process twice as many bytes of traffic as the other 3 nodes. Remember that this does not necessarily mean more requests, just that Server2 will handle twice as much I/O as the other nodes. Again, values are relative as in Request Counting.

3. Pending Request

Pending Request basically works according to who is busiest. Apache will route requests to the node which has the least amount of active requests. This becomes especially useful with nodes that queue requests since Apache’s load balancing algorithm will guarantee those queues stay even.

So, thats it. A quick explanation on what Apache does and further, the real purpose behind a web server. Web servers can provide a ton of functionality to all traffic coming through your doors.

Advertisements