IntroductionA Web proxy is a program that acts as a middleman between a Web browser and an end server. Instead ofcontacting the end server directly to get a Web page, the browser contacts the proxy, which forwards therequest on to the end server. When the end server replies to the proxy, the proxy sends the reply on to thebrowser.Proxies are used for many purposes. Sometimes proxies are used in firewalls, such that the proxy is the onlyway for a browser inside the firewall to contact an end server outside. The proxy may do translation on thepage, for instance, to make it viewable on a Web-enabled cell phone. Proxies are also used as anonymizers.By stripping a request of all identifying information, a proxy can make the browser anonymous to the endserver. Proxies can even be used to cache Web objects, by storing a copy of, say, an image when a requestfor it is first made, and then serving that image in response to future requests rather than going to the endserver.In this lab, you will write a concurrent Web proxy that logs requests. In the first part of the lab, you willwrite a simple sequential proxy that repeatedly waits for a request, forwards the request to the end server,and returns the result back to the browser, keeping a log of such requests in a disk file. This part will helpyou understand basics about network programming and the HTTP protocol.In the second part of the lab, you will upgrade your proxy so that it uses threads to deal with multiple clientsconcurrently. This part will give you some experience with concurrency and synchronization, which arecrucial computer systems concepts.LogisticsAs always, you may work in a group of up to two people. The only handin will be electronic. Any clarificationsand revisions to the assignment will be posted on the course Web page.1Hand Out InstructionsSITE-SPECIFIC: Insert a paragraph here that explains how the instructor will hand outthe proxylab-handout.tar file to the students.Start by copying proxylab-handout.tarto a (protected) directory in which you plan to do your work.Then give the command tar xvf proxylab-handout.tar. This will cause a number of files tobe unpacked in the directory: proxy.c: This is the only file you will be modifying and handing in. It contains the bulk of thelogic for your proxy. csapp.c: This is the file of the same name that is described in the CS:APP textbook. It containserror handling wrappers and helper functions such as the RIO (Robust I/O) package (CS:APP 11.4),open clientfd (CS:APP 12.4.4), and open listenfd (CS:APP 12.4.7). csapp.h: This file contains a few manifest constants, type definitions, and prototypes for the functionsin csapp.c. Makefile: Compiles and links proxy.c and csapp.c into the executable proxy.Your proxy.c file may call any function in the csapp.c file. However, since you are only handing in asingle proxy.c file, please dont modify the csapp.c file. If you want different versions of functions inin csapp.c (see the Hints section), write new functions in the proxy.c file.Part I: Implementing a Sequential Web ProxyIn this part you will implement a sequential logging proxy. Your proxy should open a socket and listenfor a connection request. When it receives a connection request, it should accept the connection, read theHTTP request, and parse it to determine the name of the end server. It should then open a connection to theend server, send it the request, receive the reply, and forward the reply to the browser if the request is notblocked.Since your proxy is a middleman between client and end server, it will have elements of both. It will act asa server to the web browser, and as a client to the end server. Thus you will get experience with both clientand server programming.LoggingYour proxy should keep track of all requests in a log file named proxy.log. Each log file entry should bea file of the form:Date: browserIP URL sizewhere browserIP is the IP address of the browser, URL is the URL asked for, size is the size in bytesof the object that was returned. For instance:2Sun 27 Oct 2002 02:51:02 EST: 128.2.111.38 http://www.cs.cmu.edu/ 34314Note that size is essentially the number of bytes received from the end server, from the time the connectionis opened to the time it is closed. Only requests that are met by a response from an end server should belogged. We have provided the function format log entry in csapp.c to create a log entry in therequired format.Port NumbersYou proxy should listen for its connection requests on the port number passed in on the command line:unix> ./proxy 15213You may use any port number p, where 1024 p 65536, and where p is not currently being used by anyother system or user services (including other students proxies). See /etc/services for a list of theport numbers reserved by other system services.Part II: Dealing with multiple requests concurrentlyReal proxies do not process requests sequentially. They deal with multiple requests concurrently. Once youhave a working sequential logging proxy, you should alter it to handle multiple requests concurrently. Thesimplest approach is to create a new thread to deal with each new connection request that arrives (CSAPP13.3.8).With this approach, it is possible for multiple peer threads to access the log file concurrently. Thus, you willneed to use a semaphore to synchronize access to the file such that only one peer thread can modify it at atime. If you do not synchronize the threads, the log file might be corrupted. For instance, one line in the filemight begin in the middle of another.EvaluationEach group will be evaluated on the basis of a demo to your instructors. See the course Web page forinstructions on how to sign up for your demos. Basic proxy functionality (30 points). Your sequential proxy should correctly accept connections,forward the requests to the end server, and pass the response back to the browser, making a log entryfor each request. Your program should be able to proxy browser requests to the following Web sitesand correctly log the requests: http://www.yahoo.com http://www.aol.com http://www.nfl.com Handling concurrent requests (20 points).Your proxy should be able to handle multiple concurrent connections. We will determine this usingthe following test: (1) Open a connection to your proxy using telnet, and then leave it open without3typing in any data. (2) Use a Web browser (pointed at your proxy) to request content from some endserver.Furthermore, your proxy should be thread-safe, protecting all updates of the log file and protectingcalls to any thread unsafe functions such as gethostbyaddr.We will determine this by inspectionduring the demo. Style (10 points). Up to 10 points will be awarded for code that is readable and well commented.Your code should begin with a comment block that describes in a general way how your proxy works.Furthermore, each function should have a comment block describing what that function does. Furthermore,your threads should run detached, and your code should not have any memory leaks. Wewill determine this by inspection during the demo.Hints The best way to get going on your proxy is to start with the basic echo server (CS:APP 12.4.9) andthen gradually add functionality that turns the server into a proxy. Initially, you should debug your proxy using telnet as the client (CS:APP 12.5.3). Later, test your proxy with a real browser. Explore the browser settings until you find proxies, thenenter the host and port where youre running yours. With Netscape, choose Edit, then Preferences,then Advanced, then Proxies, then Manual Proxy Configuration. In Internet Explorer, choose Tools,then Options, then Connections, then LAN Settings. Check Use proxy server, and click Advanced.Just set your HTTP proxy, because thats all your code is going to be able to handle. Since we want you to focus on network programming issues for this lab, we have provided you withtwo additional helper routines: parse uri, which extracts the hostname, path, and port componentsfrom a URI, and format log entry,which constructs an entry for the log file in the proper format. Be careful about memory leaks. When the processing for an HTTP request fails for any reason, thethread must close all open socket descriptors and free all memory resources before terminating. You will find it very useful to assign each thread a small unique integer ID (such as the current requestnumber) and then pass this ID as one of the arguments to the thread routine. If you display this ID ineach of your debugging output statements, then you can accurately track the activity of each thread. To avoid a potentially fatal memory leak, your threads should run as detached, not joinable (CS:APP13.3.6). Since the log file is being written to by multiple threads, you must protect it with mutual exclusionsemaphores whenever you write to it (CS:APP 13.5.2 and 13.5.3). Be very careful about calling thread-unsafe functions such as inet ntoa, gethostbyname, andgethostbyaddr inside a thread. In particular, the open clientfd function in csapp.c isthread-unsafe because it calls gethostbyaddr, a Class-3 thread unsafe function (CSAPP 13.7.1).You will need to write a thread-safe version of open clientfd, called open clientfd ts, thatuses the lock-and-copy technique (CS:APP 13.7.1) when it calls gethostbyaddr. Use the RIO (Robust I/O) package (CS:APP 11.4) for all I/O on sockets. Do not use standard I/O onsockets. You will quickly run into problems if you do. However, standard I/O calls such as fopenand fwrite are fine for I/O on the log file.4 The Rio readn, Rio readlineb, and Rio writen error checking wrappers in csapp.c arenot appropriate for a realistic proxy because they terminate the process when they encounter anerror. Instead, you should write new wrappers called Rio readn w, Rio readlineb w, andRio writen w that simply return after printing a warning message when I/O fails. When eitherof the read wrappers detects an error, it should return 0, as though it encountered EOF on the socket. Reads and writes can fail for a variety of reasons. The most common read failure is an errno =ECONNRESET error caused by reading from a connection that has already been closed by the peeron the other end, typically an overloaded end server. The most common write failure is an errno =EPIPE error caused by writing to a connection that has been closed by its peer on the other end. Thiscan occur for example, when a user hits their browsers Stop button during a long transfer. Writing to connection that has been closed by the peer first time elicits an error with errno set toEPIPE. Writing to such a connection a second time elicits a SIGPIPE signal whose default action isto terminate the process. To keep your proxy from crashing you can use the SIG IGN argument to thesignal function (CS:APP 8.5.3) to explicitly ignore these SIGPIPE signalsHandin InstructionsSITE-SPECIFIC: Insert a paragraph here that tells each team how to hand in theirproxy.c solution file. For example, here are the handin instructions we use at CMU. Remove any extraneous print statements. Make sure that you have included your identifying information in proxy.c. Create a team name of the form: ID where ID is your andrew ID. To hand in your proxy.c file, type:make handin TEAM=teamnamewhere teamname is the team name described above. After the handin, you can submit a revised copy by typingmake handin TEAM=teamname VERSION=2You can verify your handin by looking at/afs/cs.cmu.edu/academic/class/15213-f02/L7/handinYou have list and insert permissions in this directory, but no read or write permissions.
Programming
[Solved] Lab Assignment 7: Web Proxy
$25
File Name: Lab_Assignment_7:_Web_Proxy.zip
File Size: 254.34 KB
Only logged in customers who have purchased this product may leave a review.
Reviews
There are no reviews yet.