Web technologies -- Laboratory 2 -- 2007-2008 -- info.uvt.ro

These pages are somehow outdated and it is recommended to consult the newer version at Web technologies -- 2009-2010 -- info.uvt.ro (by Marc Frâncu).

Web technologies -- 2007-2008 -- info.uvt.ro

URI -- Uniform Resource Identifier

Subclases

URL: It denotes a resource using the exact location by encoding the exact access method and parameters.

URN: It denotes a resource by uniquely identifying the resource and not relating to its location.

Syntax

Specification: RFC 1630.

Syntax:

<scheme>:<hierarchy>[?<query>][#<fragment>]

Example:

  foo://example.com:8042/over/there?name=ferret#nose
  \ /   \______________/\_________/ \_________/ \__/
   |           |             |           |        |
scheme     authority        path       query   fragment
   |   ______________________|_
  / \ /                        \
  urn:example:animal:ferret:nose

Design criteria

(Quoted from RFC 1630)

Extensible: New naming schemes may be added later.

Complete: It is possible to encode any naming scheme.

Printable: It is possible to express any URI using 7-bit ASCII characters so that URIs may, if necessary, be passed using pen and ink.

HTTP

References

URL

Syntax:

http://<host>[:<port>]/[<resource>][?<query>]

Example:

http://www.google.com/
http://www.google.com/search?q=http&hl=en

MIME Type

http://en.wikipedia.org/wiki/Mime_type

Characteristics

request/response protocol;
stateless protocol;
independent of transport protocol / layer;

Actors

User agent: Is a client application which contacts a server on behalf of the user.

download client;
web browser;
web spider;

Server: Is a server application which receives requests and answers them.

Proxy: Is a server application that receives requests and decides to serve them itself, or pass them to the real server, or through a chain of servers. The requests and responses transferred may bee modified by it.

caching proxy;
anonymizing proxy;
transparent proxy;
reverse proxy;

Protocol

Specifications: RFC 2616.

Request:

[method] [resource] [version]<CRLF>
[header]: [value]<CRLF>
<CRLF>

Example request:

GET /index.html HTTP/1.1<CRLF>
Host: www.example.com<CRLF>
<CRLF>

Response:

[version] [status] [message]<CRLF>
[header]: [value]<CRLF>
<CRLF>
[body]...

Example response:

HTTP/1.1 200 OK<CRLF>
Date: Mon, 23 May 2005 22:38:34 GMT<CRLF>
Server: Apache/1.3.27 (Unix)  (Red-Hat/Linux)<CRLF>
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT<CRLF>
Etag: "3f80f-1b6-3e1cb03b"<CRLF>
Accept-Ranges: bytes<CRLF>
Content-Length: 438<CRLF>
Connection: close<CRLF>
Content-Type: text/html; charset=UTF-8<CRLF>
<CRLF>
<Content ...>

Request methods

HEAD: Used to retrieve only the header of the response. Useful for requesting the meta-information without the actual content.

GET: Used to retrieve both the meta-information and the content of the resource. It is the most used method. It should have no side-effects. (It should be a safe method.)

POST: Used to send some data to be processed. (For example as result of filling and sending some user forms.)

PUT: Used to replace a resource.

DELETE: Used to remove a resource.

TRACE: Used to debug or diagnosticate a request. Each server should echo the received request.

OPTIONS: Used to identify the capabilities of the server.

CONNECT

HTTP versions

HTTP/1.0 -- May 1996
HTTP/1.1 -- June 1999

HTTP/1.1 features:

Status codes

List of status codes -- Wikipedia

2xx -- Success
- 200 -- OK
- 201 -- Created
- 202 -- Accepted
3xx -- Redirection
- 301 -- Moved permanently
4xx -- Client error
- 400 -- Bad request
- 401 -- Unauthorized
- 403 -- Forbidden
- 404 -- Not found
- 405 -- Method not allowed
5xx -- Server error
- 500 -- Internal server error
- 501 -- Not implemented

Headers

List of headers -- Wikipedia

Headers are important to HTTP, as they define some important characteristics of the connection and data sent or received.

Accept

Accept: text/plain

Accept-Charset

Accept-Charset: iso-8859-5

Accept-Encoding

Accept-Encoding: compress, gzip

Accept-Language

Accept-Language: da

Content-Encoding

Content-Encoding: gzip

Content-Language

Content-Language: da

Content-Length

Content-Length: 348

Content-Type

Content-Type: text/html; charset=utf-8

Host

Host: www.w3.org

If-Modified-Since

If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT

Last-Modified

Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT

Server

Server: Apache/1.3.27 (Unix) (Red-Hat/Linux)

User-Agent

User-Agent: Mozilla/5.0 (Linux; X11; UTF-8)

HTTP client steps

Parsing the requested URL;
Creating a client socket and connecting it to the server address and port;
Sending the request and header lines followed by an empty line;
Receiving the response and header lines;
If necessary receiving the content;

HTTP server steps

Creating a server socket and binding it to the right address and port -- these will be used by clients;
Accepting connections from clients and for each one apply the following steps in parallel;
Receiving the request and header lines, until an empty line is encountered;
Analyzing the request line to determine the requested method and resource;
Preparing the content;
Sending the response and header lines;
Sending the content;

Java API

java.net
- URL
- URLConnection

Client code

import java.io.*;
import java.net.*;


public class HttpClient
{
    
    public static void main (String[] arguments) throws Throwable
    {
        URL url = new URL (arguments[0]);
        
        URLConnection connection = url.openConnection ();
        connection.connect ();
        
        String contentType = connection.getContentType ();
        String contentEncoding = connection.getContentEncoding ();
        int contentLength = connection.getContentLength ();
        
        System.err.println ("Content-Type: " + contentType);
        System.err.println ("Content-Encoding: " + contentEncoding);
        System.err.println ("Content-Length: " + contentLength);
        
        InputStream input = connection.getInputStream ();
        
        byte[] buffer = new byte [1024];
        int read = input.read (buffer);
        while (read > 0) {
            System.out.write (buffer, 0, read);
            read = input.read (buffer);
        }
        
        input.close ();
    }
}

Client skeleton

import java.io.*;
import java.net.*;
import java.util.regex.*;


public class HttpClient
{
    
    public static void main (String[] arguments)
    {
        // Checking the arguments
        if (arguments.length != 1)
            ...
        
        // Parsing the URL
        URL url;
        try {
            ...
        } catch (Exception error) {
            ...
        }
        
        HttpClient.get (url, System.out);
    }
    
    
    public static boolean get (URL url, OutputStream sink)
    {
        // Obtaining the needed URL parts, handling the cases in which the parts are missing
        String protocol = ...
        String host = ...
        int port = ...
        String path = ...
        String query = ...
        
        return (HttpClient.get (host, port, path, query, sink));
    }
    
    
    public static boolean get (String host, int port, String path, String query, OutputStream sink)
    {
        // Resolving the remote server IP address
        InetAddress hostAddress;
        try {
            ...
        } catch (IOException error) {
            ...
        }
        
        // Creating, binding and connecting the socket
        Socket socket;
        try {
            ...
        } catch (IOException error) {
            ...
        }
        
        boolean succeeded = HttpClient.get (socket, path, query, sink);
        
        // Closing the socket
        try {
            ...
        } catch (IOException error) {
            ...
        }
        
        return (succeeded);
    }
    
    
    public static boolean get (Socket socket, String path, String query, OutputStream sink)
    {
        try {
            
            // Creating the buffered input and output streams
            final BufferedInputStream input = ...
            final BufferedOutputStream output = ...
            
            // Creating the request line
            String requestLine = ...
            
            // Writing the request line and an empty header
            HttpTools.writeLine (output, ...);
            ...
            
            // Receiving the status line
            String statusLine = HttpTools.readLine (input);
            
            // Checking to see if we received a status line, and that the status is 200
            if (statusLine != null)
                ...
            
            // We could use regular expressions to parse the status line
            Matcher statusLineMatcher = HttpClient.statusLinePattern.matcher (statusLine);
            ...
            
            String status = ...
            
            if (!status.equals ("200"))
                ...
            
            // Reading and ignoring the header lines
            String headerLine = ...
            ...
            
            // Transfering the body to the sink output stream
            HttpTools.transfer (input, sink);
            
            return (true);
            
        } catch (IOException error) {
            ...
        }
    }
    
    
    public static Pattern statusLinePattern = Pattern.compile (...);
}

Tools skeleton

import java.io.*;


public class HttpTools
{
    
    public static String readLine (InputStream stream) throws IOException
    {
        ...
    }
    
    
    public static void writeLine (OutputStream stream, String line) throws IOException
    {
        ...
    }
    
    
    public static void transfer (InputStream source, OutputStream sink) throws IOException
    {
        ...
    } 
}

Assignment

Implement a simple HTTP client application which:

takes on the command line an URL as an argument;
parses the given URL to obtain all the needed information, or uses the default values for the missing information;
contacts the specified web server;
requests the resource;
interprets the received status line;
prints the response body;
handles the most common errors that can be encountered.

It is forbidden to use an existing HTTP library or class; you should implement the HTTP protocol yourself. You may use the URL class for parsing the argument. You also may use the client skeleton presented above, or implement it yourself.

The assignments should be sent by email to me until next Tuesday. The subject of the email should be [WebTech] FirstName LastName -- A1.

Ciprian Dorin Craciun, 2007-10-10, ccraciun@info.uvt.ro

Navigation