Web technologies -- Laboratory 2 -- 2007-2008 -- info.uvt.ro
Navigation
[edit]- Web technologies -- 2007-2008 -- info.uvt.ro
- Web technologies -- Laboratory 1 -- 2007-2008 -- info.uvt.ro
- Web technologies -- Laboratory 2 -- 2007-2008 -- info.uvt.ro
- Web technologies -- Laboratory 3 -- 2007-2008 -- info.uvt.ro
- Web technologies -- Laboratory 4 -- 2007-2008 -- info.uvt.ro
- Web technologies -- Laboratory 5 -- 2007-2008 -- info.uvt.ro
- Web technologies -- Laboratory 6 -- 2007-2008 -- info.uvt.ro
- Web technologies -- Laboratory 7 -- 2007-2008 -- info.uvt.ro
- Web technologies -- Laboratory 8 -- 2007-2008 -- info.uvt.ro
- Web technologies -- Laboratory 9 -- 2007-2008 -- info.uvt.ro
- Web technologies -- Laboratory 10 -- 2007-2008 -- info.uvt.ro
Subclases
[edit]- URL
- It denotes a resource using the exact location by encoding the exact access method and parameters.
- URN
- It denotes a resource by uniquely identifying the resource and not relating to its location.
Syntax
[edit]Specification: RFC 1630.
Syntax:
<scheme>:<hierarchy>[?<query>][#<fragment>]
Example:
foo://example.com:8042/over/there?name=ferret#nose \ / \______________/\_________/ \_________/ \__/ | | | | | scheme authority path query fragment | ______________________|_ / \ / \ urn:example:animal:ferret:nose
Design criteria
[edit](Quoted from RFC 1630)
- Extensible
- New naming schemes may be added later.
- Complete
- It is possible to encode any naming scheme.
- Printable
- It is possible to express any URI using 7-bit ASCII characters so that URIs may, if necessary, be passed using pen and ink.
References
[edit]- HTTP -- Wikipedia
- List of HTTP status codes -- Wikipedia
- Computer networks -- HTTP -- Wikibooks
- RFC 2616 -- HTTP specification
URL
[edit]Syntax:
http://<host>[:<port>]/[<resource>][?<query>]
Example:
http://www.google.com/ http://www.google.com/search?q=http&hl=en
MIME Type
[edit]http://en.wikipedia.org/wiki/Mime_type
Characteristics
[edit]- request/response protocol;
- stateless protocol;
- independent of transport protocol / layer;
Actors
[edit]- User agent
- Is a client application which contacts a server on behalf of the user.
- download client;
- web browser;
- web spider;
- Server
- Is a server application which receives requests and answers them.
- Proxy
- Is a server application that receives requests and decides to serve them itself, or pass them to the real server, or through a chain of servers. The requests and responses transferred may bee modified by it.
- caching proxy;
- anonymizing proxy;
- transparent proxy;
- reverse proxy;
Protocol
[edit]Specifications: RFC 2616.
Request:
[method] [resource] [version]<CRLF> [header]: [value]<CRLF> <CRLF>
Example request:
GET /index.html HTTP/1.1<CRLF> Host: www.example.com<CRLF> <CRLF>
Response:
[version] [status] [message]<CRLF> [header]: [value]<CRLF> <CRLF> [body]...
Example response:
HTTP/1.1 200 OK<CRLF> Date: Mon, 23 May 2005 22:38:34 GMT<CRLF> Server: Apache/1.3.27 (Unix) (Red-Hat/Linux)<CRLF> Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT<CRLF> Etag: "3f80f-1b6-3e1cb03b"<CRLF> Accept-Ranges: bytes<CRLF> Content-Length: 438<CRLF> Connection: close<CRLF> Content-Type: text/html; charset=UTF-8<CRLF> <CRLF> <Content ...>
Request methods
[edit]- HEAD
- Used to retrieve only the header of the response. Useful for requesting the meta-information without the actual content.
- GET
- Used to retrieve both the meta-information and the content of the resource. It is the most used method. It should have no side-effects. (It should be a safe method.)
- POST
- Used to send some data to be processed. (For example as result of filling and sending some user forms.)
- PUT
- Used to replace a resource.
- DELETE
- Used to remove a resource.
- TRACE
- Used to debug or diagnosticate a request. Each server should echo the received request.
- OPTIONS
- Used to identify the capabilities of the server.
- CONNECT
HTTP versions
[edit]- HTTP/1.0 -- May 1996
- HTTP/1.1 -- June 1999
HTTP/1.1 features:
Status codes
[edit]List of status codes -- Wikipedia
- 2xx -- Success
- 200 -- OK
- 201 -- Created
- 202 -- Accepted
- 3xx -- Redirection
- 301 -- Moved permanently
- 4xx -- Client error
- 400 -- Bad request
- 401 -- Unauthorized
- 403 -- Forbidden
- 404 -- Not found
- 405 -- Method not allowed
- 5xx -- Server error
- 500 -- Internal server error
- 501 -- Not implemented
Headers
[edit]Headers are important to HTTP, as they define some important characteristics of the connection and data sent or received.
- Accept
Accept: text/plain
- Accept-Charset
Accept-Charset: iso-8859-5
- Accept-Encoding
Accept-Encoding: compress, gzip
- Accept-Language
Accept-Language: da
- Content-Encoding
Content-Encoding: gzip
- Content-Language
Content-Language: da
- Content-Length
Content-Length: 348
- Content-Type
Content-Type: text/html; charset=utf-8
- Host
Host: www.w3.org
- If-Modified-Since
If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT
- Last-Modified
Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT
- Server
Server: Apache/1.3.27 (Unix) (Red-Hat/Linux)
- User-Agent
User-Agent: Mozilla/5.0 (Linux; X11; UTF-8)
HTTP client steps
[edit]- Parsing the requested URL;
- Creating a client socket and connecting it to the server address and port;
- Sending the request and header lines followed by an empty line;
- Receiving the response and header lines;
- If necessary receiving the content;
HTTP server steps
[edit]- Creating a server socket and binding it to the right address and port -- these will be used by clients;
- Accepting connections from clients and for each one apply the following steps in parallel;
- Receiving the request and header lines, until an empty line is encountered;
- Analyzing the request line to determine the requested method and resource;
- Preparing the content;
- Sending the response and header lines;
- Sending the content;
Java API
[edit]
Client code
[edit]import java.io.*; import java.net.*; public class HttpClient { public static void main (String[] arguments) throws Throwable { URL url = new URL (arguments[0]); URLConnection connection = url.openConnection (); connection.connect (); String contentType = connection.getContentType (); String contentEncoding = connection.getContentEncoding (); int contentLength = connection.getContentLength (); System.err.println ("Content-Type: " + contentType); System.err.println ("Content-Encoding: " + contentEncoding); System.err.println ("Content-Length: " + contentLength); InputStream input = connection.getInputStream (); byte[] buffer = new byte [1024]; int read = input.read (buffer); while (read > 0) { System.out.write (buffer, 0, read); read = input.read (buffer); } input.close (); } }
Client skeleton
[edit]import java.io.*; import java.net.*; import java.util.regex.*; public class HttpClient { public static void main (String[] arguments) { // Checking the arguments if (arguments.length != 1) ... // Parsing the URL URL url; try { ... } catch (Exception error) { ... } HttpClient.get (url, System.out); } public static boolean get (URL url, OutputStream sink) { // Obtaining the needed URL parts, handling the cases in which the parts are missing String protocol = ... String host = ... int port = ... String path = ... String query = ... return (HttpClient.get (host, port, path, query, sink)); } public static boolean get (String host, int port, String path, String query, OutputStream sink) { // Resolving the remote server IP address InetAddress hostAddress; try { ... } catch (IOException error) { ... } // Creating, binding and connecting the socket Socket socket; try { ... } catch (IOException error) { ... } boolean succeeded = HttpClient.get (socket, path, query, sink); // Closing the socket try { ... } catch (IOException error) { ... } return (succeeded); } public static boolean get (Socket socket, String path, String query, OutputStream sink) { try { // Creating the buffered input and output streams final BufferedInputStream input = ... final BufferedOutputStream output = ... // Creating the request line String requestLine = ... // Writing the request line and an empty header HttpTools.writeLine (output, ...); ... // Receiving the status line String statusLine = HttpTools.readLine (input); // Checking to see if we received a status line, and that the status is 200 if (statusLine != null) ... // We could use regular expressions to parse the status line Matcher statusLineMatcher = HttpClient.statusLinePattern.matcher (statusLine); ... String status = ... if (!status.equals ("200")) ... // Reading and ignoring the header lines String headerLine = ... ... // Transfering the body to the sink output stream HttpTools.transfer (input, sink); return (true); } catch (IOException error) { ... } } public static Pattern statusLinePattern = Pattern.compile (...); }
Tools skeleton
[edit]import java.io.*; public class HttpTools { public static String readLine (InputStream stream) throws IOException { ... } public static void writeLine (OutputStream stream, String line) throws IOException { ... } public static void transfer (InputStream source, OutputStream sink) throws IOException { ... } }
Assignment
[edit]Implement a simple HTTP client application which:
- takes on the command line an URL as an argument;
- parses the given URL to obtain all the needed information, or uses the default values for the missing information;
- contacts the specified web server;
- requests the resource;
- interprets the received status line;
- prints the response body;
- handles the most common errors that can be encountered.
It is forbidden to use an existing HTTP library or class; you should implement the HTTP protocol yourself. You may use the URL class for parsing the argument. You also may use the client skeleton presented above, or implement it yourself.
The assignments should be sent by email to me until next Tuesday. The subject of the email should be [WebTech] FirstName LastName -- A1.
Ciprian Dorin Craciun, 2007-10-10, ccraciun@info.uvt.ro