Web technologies -- Laboratory 2 -- 2007-2008 -- info.uvt.ro

From Wikiversity
Important! These pages are somehow outdated and it is recommended to consult the newer version at Web technologies -- 2009-2010 -- info.uvt.ro (by Marc Frâncu).

Navigation[edit]


URI -- Uniform Resource Identifier[edit]

Subclases[edit]

URL
It denotes a resource using the exact location by encoding the exact access method and parameters.
URN
It denotes a resource by uniquely identifying the resource and not relating to its location.


Syntax[edit]

Specification: RFC 1630.

Syntax:

<scheme>:<hierarchy>[?<query>][#<fragment>]

Example:

  foo://example.com:8042/over/there?name=ferret#nose
  \ /   \______________/\_________/ \_________/ \__/
   |           |             |           |        |
scheme     authority        path       query   fragment
   |   ______________________|_
  / \ /                        \
  urn:example:animal:ferret:nose


Design criteria[edit]

(Quoted from RFC 1630)

Extensible
New naming schemes may be added later.
Complete
It is possible to encode any naming scheme.
Printable
It is possible to express any URI using 7-bit ASCII characters so that URIs may, if necessary, be passed using pen and ink.

HTTP[edit]

References[edit]

URL[edit]

Syntax:

http://<host>[:<port>]/[<resource>][?<query>]

Example:

http://www.google.com/
http://www.google.com/search?q=http&hl=en


MIME Type[edit]

http://en.wikipedia.org/wiki/Mime_type

Characteristics[edit]


Actors[edit]

User agent
Is a client application which contacts a server on behalf of the user.
  • download client;
  • web browser;
  • web spider;
Server
Is a server application which receives requests and answers them.
Proxy
Is a server application that receives requests and decides to serve them itself, or pass them to the real server, or through a chain of servers. The requests and responses transferred may bee modified by it.
  • caching proxy;
  • anonymizing proxy;
  • transparent proxy;
  • reverse proxy;


Protocol[edit]

Specifications: RFC 2616.

Request:

[method] [resource] [version]<CRLF>
[header]: [value]<CRLF>
<CRLF>

Example request:

GET /index.html HTTP/1.1<CRLF>
Host: www.example.com<CRLF>
<CRLF>

Response:

[version] [status] [message]<CRLF>
[header]: [value]<CRLF>
<CRLF>
[body]...

Example response:

HTTP/1.1 200 OK<CRLF>
Date: Mon, 23 May 2005 22:38:34 GMT<CRLF>
Server: Apache/1.3.27 (Unix)  (Red-Hat/Linux)<CRLF>
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT<CRLF>
Etag: "3f80f-1b6-3e1cb03b"<CRLF>
Accept-Ranges: bytes<CRLF>
Content-Length: 438<CRLF>
Connection: close<CRLF>
Content-Type: text/html; charset=UTF-8<CRLF>
<CRLF>
<Content ...>


Request methods[edit]

HEAD
Used to retrieve only the header of the response. Useful for requesting the meta-information without the actual content.
GET
Used to retrieve both the meta-information and the content of the resource. It is the most used method. It should have no side-effects. (It should be a safe method.)
POST
Used to send some data to be processed. (For example as result of filling and sending some user forms.)
PUT
Used to replace a resource.
DELETE
Used to remove a resource.
TRACE
Used to debug or diagnosticate a request. Each server should echo the received request.
OPTIONS
Used to identify the capabilities of the server.
CONNECT


HTTP versions[edit]

  • HTTP/1.0 -- May 1996
  • HTTP/1.1 -- June 1999

HTTP/1.1 features:


Status codes[edit]

List of status codes -- Wikipedia

  • 2xx -- Success
    • 200 -- OK
    • 201 -- Created
    • 202 -- Accepted
  • 3xx -- Redirection
    • 301 -- Moved permanently
  • 4xx -- Client error
    • 400 -- Bad request
    • 401 -- Unauthorized
    • 403 -- Forbidden
    • 404 -- Not found
    • 405 -- Method not allowed
  • 5xx -- Server error
    • 500 -- Internal server error
    • 501 -- Not implemented


Headers[edit]

List of headers -- Wikipedia

Headers are important to HTTP, as they define some important characteristics of the connection and data sent or received.

  • Accept
Accept: text/plain
  • Accept-Charset
Accept-Charset: iso-8859-5
  • Accept-Encoding
Accept-Encoding: compress, gzip
  • Accept-Language
Accept-Language: da
  • Content-Encoding
Content-Encoding: gzip
  • Content-Language
Content-Language: da
  • Content-Length
Content-Length: 348
  • Content-Type
Content-Type: text/html; charset=utf-8
  • Host
Host: www.w3.org
  • If-Modified-Since
If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT
  • Last-Modified
Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT
  • Server
Server: Apache/1.3.27 (Unix) (Red-Hat/Linux)
  • User-Agent
User-Agent: Mozilla/5.0 (Linux; X11; UTF-8)


HTTP client steps[edit]

  1. Parsing the requested URL;
  2. Creating a client socket and connecting it to the server address and port;
  3. Sending the request and header lines followed by an empty line;
  4. Receiving the response and header lines;
  5. If necessary receiving the content;


HTTP server steps[edit]

  1. Creating a server socket and binding it to the right address and port -- these will be used by clients;
  2. Accepting connections from clients and for each one apply the following steps in parallel;
  3. Receiving the request and header lines, until an empty line is encountered;
  4. Analyzing the request line to determine the requested method and resource;
  5. Preparing the content;
  6. Sending the response and header lines;
  7. Sending the content;


Java API[edit]


Client code[edit]

import java.io.*;
import java.net.*;


public class HttpClient
{
    
    public static void main (String[] arguments) throws Throwable
    {
        URL url = new URL (arguments[0]);
        
        URLConnection connection = url.openConnection ();
        connection.connect ();
        
        String contentType = connection.getContentType ();
        String contentEncoding = connection.getContentEncoding ();
        int contentLength = connection.getContentLength ();
        
        System.err.println ("Content-Type: " + contentType);
        System.err.println ("Content-Encoding: " + contentEncoding);
        System.err.println ("Content-Length: " + contentLength);
        
        InputStream input = connection.getInputStream ();
        
        byte[] buffer = new byte [1024];
        int read = input.read (buffer);
        while (read > 0) {
            System.out.write (buffer, 0, read);
            read = input.read (buffer);
        }
        
        input.close ();
    }
}


Client skeleton[edit]

import java.io.*;
import java.net.*;
import java.util.regex.*;


public class HttpClient
{
    
    public static void main (String[] arguments)
    {
        // Checking the arguments
        if (arguments.length != 1)
            ...
        
        // Parsing the URL
        URL url;
        try {
            ...
        } catch (Exception error) {
            ...
        }
        
        HttpClient.get (url, System.out);
    }
    
    
    public static boolean get (URL url, OutputStream sink)
    {
        // Obtaining the needed URL parts, handling the cases in which the parts are missing
        String protocol = ...
        String host = ...
        int port = ...
        String path = ...
        String query = ...
        
        return (HttpClient.get (host, port, path, query, sink));
    }
    
    
    public static boolean get (String host, int port, String path, String query, OutputStream sink)
    {
        // Resolving the remote server IP address
        InetAddress hostAddress;
        try {
            ...
        } catch (IOException error) {
            ...
        }
        
        // Creating, binding and connecting the socket
        Socket socket;
        try {
            ...
        } catch (IOException error) {
            ...
        }
        
        boolean succeeded = HttpClient.get (socket, path, query, sink);
        
        // Closing the socket
        try {
            ...
        } catch (IOException error) {
            ...
        }
        
        return (succeeded);
    }
    
    
    public static boolean get (Socket socket, String path, String query, OutputStream sink)
    {
        try {
            
            // Creating the buffered input and output streams
            final BufferedInputStream input = ...
            final BufferedOutputStream output = ...
            
            // Creating the request line
            String requestLine = ...
            
            // Writing the request line and an empty header
            HttpTools.writeLine (output, ...);
            ...
            
            // Receiving the status line
            String statusLine = HttpTools.readLine (input);
            
            // Checking to see if we received a status line, and that the status is 200
            if (statusLine != null)
                ...
            
            // We could use regular expressions to parse the status line
            Matcher statusLineMatcher = HttpClient.statusLinePattern.matcher (statusLine);
            ...
            
            String status = ...
            
            if (!status.equals ("200"))
                ...
            
            // Reading and ignoring the header lines
            String headerLine = ...
            ...
            
            // Transfering the body to the sink output stream
            HttpTools.transfer (input, sink);
            
            return (true);
            
        } catch (IOException error) {
            ...
        }
    }
    
    
    public static Pattern statusLinePattern = Pattern.compile (...);
}


Tools skeleton[edit]

import java.io.*;


public class HttpTools
{
    
    public static String readLine (InputStream stream) throws IOException
    {
        ...
    }
    
    
    public static void writeLine (OutputStream stream, String line) throws IOException
    {
        ...
    }
    
    
    public static void transfer (InputStream source, OutputStream sink) throws IOException
    {
        ...
    } 
}

Assignment[edit]

Implement a simple HTTP client application which:

  • takes on the command line an URL as an argument;
  • parses the given URL to obtain all the needed information, or uses the default values for the missing information;
  • contacts the specified web server;
  • requests the resource;
  • interprets the received status line;
  • prints the response body;
  • handles the most common errors that can be encountered.

It is forbidden to use an existing HTTP library or class; you should implement the HTTP protocol yourself. You may use the URL class for parsing the argument. You also may use the client skeleton presented above, or implement it yourself.

The assignments should be sent by email to me until next Tuesday. The subject of the email should be [WebTech] FirstName LastName -- A1.



Ciprian Dorin Craciun, 2007-10-10, ccraciun@info.uvt.ro