blogger templates blogger widgets
This is part of a list of blog posts.
To browse the contents go to

HTTP for Java Developer


The World Wide Web had humble beginnings, as a research project at the Swiss research institute, CERN.
The primary goal of the project was to allow hypertext documents to be electronically linked, so selecting a reference in one document to a second one would cause it to be retrieved.
To implement this system, some sort of mechanism was needed to allow a client computer to tell a server to send it a document.
To fill this function, the early developers of the Web created a new TCP/IP application layer protocol: the Hypertext Transfer Protocol (HTTP).


HTTP/0.9

The original version of HTTP was intended only for the transfer of hypertext documents and was designed to be very simple to make implementation of the fledgling Web easier.
In order to perform an HTTP 0.9 request, you had to open a new TCP connection, which was closed by the server after the response had been transmitted. To establish a new connection, TCP uses a three-way handshake, which requires an extra network roundtrip before data can be exchanged.

HTTP 0.9 is a very simple text-based protocol.
General structure of HTTP 0.9 request and response (when tracked through telnet):

Request:
GET /[url]

Response:
[HTML content]
Connection closed by foreign host.


HTTP/1.0

HTTP/1.0 transformed HTTP from a trivial request/response application to a true messaging protocol.
The HEAD and POST methods were added and the concept of header fields was introduced.

It described a complete message format for HTTP, and explained how it should be used for client requests and server responses. One of the most important changes in HTTP/1.0 was the generalization of the protocol to handle many types of different media, as opposed to strictly hypertext documents.

Request:
GET /[url] HTTP/1.0
[HTTP request headers]

Response:
HTTP/1.0 302 Found
[HTTP response headers]
[HTML content]
Connection closed by foreign host.

The HTTP 0.9 connections were called transitory connections due to their short-lived nature and the same model was maintained in the more widely-deployed HTTP/1.0. The advantage of this connection model is its conceptual simplicity; the problem with it is that it is inefficient when the client needs to make many requests to the same server. This is often the case with modern hypertext documents, which usually carry inline references to images and other media.

So now we have
n requests = n connection = 1 thread (from a pool of m threads)

HTTP/1.1

HTTP/1.1 introduces several significant improvements over version 1.0 of the protocol. The most significat one is the persistence support.

With transitory connections, each of these requests made by the client requires a new, distinct TCP connection to be set up between the client and server. Every connection takes server resources and network bandwidth (TCP handshake performed every time), so having to establish a new one for each file is woefully inefficient.

The solution to the inefficiency problem of transitory connections came in HTTP/1.1, which allows an HTTP client and server to set up a persistent connection.

The basic operation of HTTP is not changed; the main difference is that by default, the TCP connection is kept open after each request/response set, so that the next request and response can be exchanged immediately.
The connection is only closed when the client is done requesting all the documents it needs.

Persistent connections offer another important performance-enhancing option to HTTP clients: the ability to pipeline requests.
Suppose the client needs to send a request for files A, B and C to a server.
Since the requests for all of these files will be sent in the same TCP session, there is no need for the client to wait for a response to its request for A before sending the request for B. The client can send requests “rapid-fire”, one after the other.

How connection establishment happens in HTTP

1. Like most TCP/IP client/server protocols, the server in HTTP plays the passive role by listening for requests on a particular port number.

2. The default port number for HTTP is well-known TCP port number 80, and is used by Web browsers for most HTTP requests, unless a different port number is specified in the URL. The client initiates an HTTP connection by opening a TCP connection from itself to the server it wishes to contact.

3. Once the TCP connection is active, the client sends its first request message.Request specifies the verison of HTTP. HTTP 1.1 uses the header Connection: Keep-Alive by default to mean that the persistent connections should be used. An HTTP/1.1 client can override this by changing it to Connection: close header in its initial request.

4. The flow of requests and responses continues for as long as the client has requests.

5. The connection can be gracefully terminated by the client by including the Connection: close header in the last request it needs to send to the server.

So now we have

n requests = 1 connection = 1 thread (from a pool of m threads)

Happy ending? No.

Persistent connections brought in a new problem.


HTTP/1.1 and Java NIO

Each thread that is handling a connection stays alive until the connection is closed. That means each thread on server will reserve memory for itself even when it is sitting idle between client requests.

Experiments with high-profile Web servers have yielded numerical results revealing that memory consumption increases almost in direct proportion with the number of HTTP connections. The reason is that threads are relatively expensive in terms of memory use. Servers configured with a fixed number of threads can suffer the thread starvation problem, whereby requests from new clients are rejected once all the threads in the pool are taken.

Java SE 1.4 introduced Non-Blocking IO in the NIO libraries. These libraries use low-level operating system constructs to allow highly optimized code in the operating system to manage TCP connections. NIO uses the terminology concept called channels to mean a non-blocking stream.

Read about NIO:
An awesome NIO tutorial
Another good one

So after NIO, we have

n http requests = 1 connection = m threads (from a pool of threads, m<=n )
Threads can be allocated to connections only when requests are being processed.

When a connection is idle between requests, the thread can be recycled, and the connection is placed in a centralized NIO select set to detect new requests without consuming a separate thread.

Along the same time of HTTP 1.x there came a need to keep track of multiple connections. Meaning we needed a kind of tracking between connections if it's coming from the same client.


Http Sessions

Read about http sessions here:
Understanding jSessionId
Session sharing
Session attribute sharing

Then came the need of asynchronous client side requests and we came up with xhtml requests and responses.
Popularly the technique came to be known as AJAX.



AJAX

Theory behind AJAX: How AJAX works

HTTP protocol works on the principle that a server sends a response only when there is a request from a client. The client initiates it. The need for server to send continuous responses even after a client request became a very much necessity for asynchronous requests (AJAX).


Reverse AJAX - Old school

Why we needed reverse AJAX?

Lets consider a scenario - you have a servlet that is doing some processing, say finding prime numbers between ranges of tens (0-10, 10-100, 100-1000,...) and so on upto 10000000. So your output would be
2, 3, 5, 7
11, 13, 17, 19, ...
101, 103, ...
.....

It's obvious that the execution will take some time and once it's done we write into servlet output stream the whole response as shown above.

Any problems with this approach?

Yes.
1. Let's say it takes 2 mins for the complete execution but your server's HTTP timeout setting might be less than this.
Changing the global setting just for one of scenarios isn't justifiable.
2. Very poor end-user friendliness. The user might just staring at the page without any clue on whats happening.

Solution

You need 2 things - the connection shouldn't timeout and the user should see some incremental response/output.

>> Handling connection timeout

Remember - HTTP is request/response model. But we have HTTP sessions. So using sessions we can track between requests.

>> Handling incremental output

You can not do this
List primes = calculatePrimes(0,10); 
response.getWriter().write(primes);
response.getWriter().flush();
primes = calculatePrimes(10,100); 
response.getWriter().write(primes);
response.getWriter().flush();
HTTP and the servlet technology that sits upon HTTP doesn't support this.

But there could be a work around using some internal server API or something but as such it's a no.

But wait. We said client is going to make Ajax requests at intervals and if that's the case then we do not need to worry about this.

We then make use of the multiple ajax requests
/calculatePrimes?range=0,10
/calculatePrimes?range=10,100
to send incremental output back.

Well this is not a straight forward solution and like this there are other approaches and collectively these are known as
Reverse AJAX or COMET .

Tutorial on Reverse AJAX and different techniques: Reverse Ajax Techniques - Old school


Async support - Servlet 3.0

Consider the scenario where a request could be blocked by a depleted JDBC connection pool, or a low-throughput Web service endpoint. Until the resource becomes available, the thread could be stuck with the pending request for a long time. It would be better to place the request in a centralized queue waiting for available resources and recycle that thread. This effectively throttles the number of request threads to match the capacity of the slow-running back-end routines. It also suggests that at a certain point during request processing (when the request is stored in the queue), no threads are consumed for the request at all.

Asynchronous support in Servlet 3.0 is designed to achieve this scenario through a universal and portable approach, whether Ajax is used or not.

Tutorial on this topic: Async Servlets

Reverse AJAX - New school

After the advent of Async servlet support, the comet technique could now be done more gracefully.
Tutorial on Reverse AJAX and different techniques: Reverse Ajax Techniques - New school

Async and Non-blocking IO - Servlet 3.1

Tutorial on this topic: Non-blocking Servlets


Server Sent Events (SSE)

Server Sent Events enables efficient server-to-client streaming of text-based event data.



2 comments:

  1. Nice information about the HTTP the request ,response method and the version in it are explained effectively and blog gave me idea about the ajax Thanks for sharing this post
    Java Training in Chennai

    ReplyDelete
  2. Updating with the latest technology and implementing it is the only way to survive in our niche. Thanks for making me this article. You have done a great job by sharing this content in here. Keep writing article like this.
    java training in chennai

    ReplyDelete