Assignment title: Information


The French Academy has decided that web servers in France must speak Le Protocol de Transfert Hypertexte (PdTHT), an all-French version of HTTP. Your assignment is to implement an HTTP proxy that will allow you to browse the french web using your normal web browser. Make sure to read this entire document before proceeding onto your implementation. PdTHT It's easiest to demonstrate how PdTHT works with a simple exchange. PdTHT Request OBTENIR / PdTHT/1.0 Hôte: cs438.fr PdHTH Response PdTHT/1.0 200 C'est bon Longeur-contenu: 27U Dernière-modification: Yb-9-A A:z:p Type-de-contenu: text/plain ... data ... Numbers One thing to watch out is that the French have decided to use sexagesimal (base 60) numbers, as French doesn't really have proper words for numbers 61–99. The encoding is actually more compact; continuing our normal hexadecimal practice, A=10, B=11, etc. They have skipped O due to confusion with 0, and thus P=24 and Z=34. After this, lowercase letters are used, with a=35, b=36, etc. Lowercase l is skipped because of potential confusion with uppercase I, leaving us m=46 and z=59. Therefore, 27U is actually2*602+7*60+29 = 7649. Dates Dates/times are specified using an ISO8601-like format, except that numbers are represented in sexagesimal, as above. So a date is written as YY-M-D H:M:S. The server implicitly uses the Paris time zone, which is UTC+2 during Daylight Saving time and UTC+1 otherwise. (Conveniently for you, DST does not end until after this assignment is over.) So the due date of the assignment, which in HTTP would be represented as Wed, 05 Oct 2016 16:30:00 GMT, would be written as Yb-A-5 I:V:0, and the date in the example above is actually September 10, 2016 at 10:59:49. Proxy Protocol Your proxy should implement the HTTP proxy protocol to communicate with a browser. You can experiment with a real browser by configuring it to use your HTTP proxy. The biggest change is that an HTTP proxy receives an entires URL on a request line: GET http://cs438.fr/test/index.html HTTP/1.1 Host: cs438.fr User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/601.7.7 (KHTML, like Gecko) Version/9.1.2 Safari/601.7.7 ... The proxy is then responsible for contacting the site cs438.fr and requesting the file /index.html. It should pass along any headers to the web server and then return the result back, translating appropriate headers. (Any headers not listed in the glossary below should be passed on untranslated.) The one exception is the Connection: header: the browser uses this header to specify whether the connection to the proxy should be kept open or not, whereas the Connexion: header sent to the web server should govern whether the connection from the proxy to the web server should be kept alive or closed. Caching Your proxy should cache objects that are requested by clients to avoid making unnecessary requests to the server. The Cache-Control: header specifies how the object may be cached by the proxy. public means that the object can be cached by the proxy. private means that our shared proxy should not cache the result, but the header should be passed along to the browser so that it can be cached there. no-cache means that the object should never be cached. maxage specifies how long the object can be cached (in seconds). Note that the Cache-Control: header should always be passed on to the browser, since it has its own cache that needs to be governed by the same header. Your cache should be in memory, so you should refuse to cache any objects larger than 10 MB. [1] If a previously requested object is in the cache and it has not expired (based on themax-age directive), it should be served from the cache. If the object has expired, your should issue a validation request to the server, sending an Si-Modifié-Depuis(If-Modified-Since) header to see if the object has changed. The server will then either send a new version of the object, or return a response 304 (Not Modified), in which case you should return the cached object. Example Client <-> Proxy Proxy <-> Server Notes GET http://serveur.fr/ HTTP/1.1 Host: serveur.fr OBTENIR / PdTHT/1.0 Hôte: serveur.fr Via: 1.1 myproxy PdTHT/1.0 200 C'est Bon Serveur: Le Serveur 0.1 Longeur-Contenu: Gf Contrôle-de-Cache: public; âge-max=20 Dernière-Modification: Yb-9-E 0:D:S ... Response stored in cache; it will be considered fresh for 2 minutes. HTTP/1.1 200 OK Server: Le Serveur 0.1 Content-Length: 1000 Cache-Control: public; max-age=120 Via: 1.1 myproxy Age: 0 ... Note the Age: header lets the proxy indicate that this is a fresh response. GET http://serveur.fr/ HTTP/1.1 Host: serveur.fr A second request 1 minute later HTTP/1.1 200 OK Server: Le Serveur 0.1 Content-Length: 1000 Cache-Control: public; max-age=120 Via: 1.1 myproxy Age: 60 Response served from cache. In this case, Age shows that the item has been in the cache for 60s. GET http://serveur.fr/ HTTP/1.1 Host: serveur.fr A third request, 3 minutes later OBTENIR / PdTHT/1.0 Hôte: serveur.fr Si-Modifié-Depuis: Yb-9-E 0:D:S Via: 1.1 myproxy Check to see if the object has been modified from the previous version. PdTHT/1.0 304 Pas Modifié Contrôle-de-Cache: public; âge-max=20 Serveur: Le Serveur 0.1 Via: 1.1 myproxy Object has not been modified. Another 120 seconds of caching is allowed. HTTP/1.1 200 OK Server: Le Serveur 0.1 Content-Length: 1000 Cache-Control: public; max-age=120 Via: 1.1 myproxy Age: 0 ... Return object from cache, but indicate that it is fresh again because it has been revalidated. Cut-through Forwarding Your proxy should start forwarding the response as soon as it starts receiving it from the server. This is called cut-through forwarding, in contrast to store-and-forward that we discussed in class. In other words, if the server returns a 10 MB object, you should not wait until you have received the entire object starting to send the result to the client. Some buffering (a few KB) is fine. Implementation You may implement this assignment in C, C++, or Python. We highly recommend Python because it makes a lot of the message parsing and translation tasks much easier than in C/C++. In case of C/C++, you should create a Makefile that creates an executable called proxy. In the case of Python, write your code in proxy.py. In both cases, your proxy should take a single command line argument specifying the port number on which it should run. Your submission should include a README that specifies who is in your group (if you are using a 2-person team) and what language you are using. You should also explicitly list any external code that you include in your submission inside the README file. Any code not listed must be your own; we will consider any code derived from another source that is not explicitly labeled as such to be a violation of academic integrity. We reserve the right to reduce your grade if we feel you borrow too much from an implementation; e.g., if you include the code for a fully-featured caching HTTP proxy. If you are not sure about whether using an external source will lose you points, please clear it with us first. To submit the code, please check the code into subversion. Please use the URL https://subversion.engr.illinois.edu/svn/fa16-ece438/NETID/mp1, where NETIDshould be replaced by your own NetID, unless we have set up a group project for you. We will grade the revision closest to the deadline, unless you explicitly ask us to grade an earlier or later revision. The course late policy applies: you will lose 2% off your grade for every hour or portion thereof that your assignment is late. Exceptions to this late policy will be granted only in exceptional circumstances. To collaborate with your partner, you can request for us to create a group directory for you in subversion; alternately, you can use the Engineering Gitlab server, but you will need to check your final submission into subversion. You may wish to learn about the git svn command to keep the subversion and git repositories in sync. If you prefer, you can use GitHub or BitBucket to store your code but make sure that you are using a private repository: putting your MP code inside a public repository is considered to be a facilitation of plagiarism and will be considered an academic integrity violation. Testing We have written a simple test server that runs PdTHT/1.0. You can access it at borisov-mac.crhc.illinois.edu port 8253. Note that if you want to try it out by sending commands to it directly, you should make sure to use a client such as nc (NetCat) that is 8-bit clean, otherwise you'll have trouble sending commands and headers with accents in them. You can use any browser to test out your proxy by configuring its proxy settings, but you may want to use wget or curl as they have much simpler behavior than a desktop web browser. We are hoping to provide you with an autograder for this assignment but it is not running at the moment. In any case, the autograder is not intended to server as a test harness and you should make sure to do your own testing. Glossary English French Description Request methods GET OBTENIR Get an object from the web server. A GET request has an empty message body. POST POSTER Upload form data or a file to a website. The request will have a message body, with the length specified in the header. The response to a POST request should never be cached. HEAD TÊTE Get the headers of the object, but not the object itself. Like a GET request but the response has an empty message body. HEAD requests can be satisfied from the cache. HTTP headers Host: Hôte: A required header in HTTP/1.1 (and PdTHT/1.0). Even though the request to the proxy has the host inside the request URL, it will also be duplicated inside the request. The value of this header should be passed unmodified to the server. Content-Length: Longeur-Contenu: Length of the message body. Required when a message body is non-zero, unless chunked encoding is used. Transfer-Encoding: chunked Encodage-de-Transfert: en-morceaux Chunked encoding allows the client or server to send a message body without specifying the length ahead of time. You can read the RFC specification of how chunked encoding works. PdTHT uses a similar encoding scheme but uses sexagesimal numbers for chunk sizes. E.g.: ... Encodage-de-Transfert: en-morceaux 5 Hello D 438 students! 0 (Note the extra blank line at the end.) Connection: close Connexion: fermer The connection should (if sent by a client)/will (if sent by a server) be closed after the next response is complete. Cache-Control: Contrôle-de-Cache: Specification that is responsible for caching Cache-Control: public Contrôle-de-Cache: public Content may be cached by a shared proxy Cache-Control: private Contrôle-de-Cache: privé Content may only be cached by the browser Cache-Control: no-cache Contrôle-de-Cache: pas-de-cache Content should not be cached. For our purposes, privé and pas-de-cache are similar, but you should translate it for the browser so that it can behave appropriately. Cache-Control: max-age=... Contrôle-de-Cache: âge-max=... When sent by the server, how long an object may be cached before it must be revalidated. When sent by the client, the oldest cached version it will accept; if max-age=0 the proxy must revalidate even if the object has not yet expired. Date Date Date that the request or response was generated. Should be passed between the server and client but translated to/from the french date format. Last-Modified: Dernière-Modification: Date when the file being returned was last modified. If-Modified-Since: Si-Modifié-Depuis: Validate whether the object has changed since the cached version; the date must be the same as returned in the Last-Modified header. The server will return a 304 response with an empty body if the file hasn't changed and a 200 response if it has. Age: Âge: Seconds since the object has been validated. Usually this will only be sent by the proxy, but if the proxy receives this header from a server, this indicates that it is in a chain of proxies and it should use this as the initial age of the response in its cache (and pass it onto the client). Vary: Varier: Normally you can use a cached object to respond to all requests with the same URL. A Varyheader specifies that the cached object depends on the value of the headers listed therein. For example, if the server specifies Vary: Accept, then the cached object should only be returned to clients that supply identical Accept headers. If a different header value is included in a request, the proxy must make a new request to the server with this new Accept header. It can then cache different versions of the object for each value of the Accept header it's seen. Via: Via: This header must be added in both directions, with the value 1.1 proxyname (you can name your proxy whatever you want). If a Via header already exists, you should append your proxy name to it, as in: Via: 1.1 incomingproxy, 1.1 proxyname Range Gamme This header specifies that the requester only wants a partial response. The value of the header is written as bytes=100-199 (octets=1f-3J), which requests 100 bytes starting with byte 100. (The first byte is numbered 0.) The server response contains the header Content-Range: 100-199/735 (Gamme-de-Contenu: 1f-3J/CF) which says that bytes 100-199 are included, out of a total of 735 bytes. Partial requests should be satisfied out of the cache if possible; if you need to go out to the server, you should forward a partial request to it. Accept Accepte The value of these headers should be passed between the proxy and the server unmodified, you just need to translate the name. Accept-Charset Accepte-Carjeu Accept-Encoding Accepte-Encodage Accept-Language Accepte-Langue Content-Type Type-de-Contenu Cookie Biscuit Content-Enconding Encodage-de-Contenu Content-Language Langue-de-Contenu Location Emplacement Server Serveur Referer Référenceur Set-Cookie: Dêfinir-Biscut: User-Agent: Agent-Utilisateur: Any other headers should be passed along completely unmodified.