A framework for Internet Content Adaptation


Version 1.2.1
October 30, 2002



This open source project gives a complete framework for Internet Content Adaptation based on the ICAP 1.0 protocol. In short, the framework provides you a proxy-based solution for online adaptation of the contents that a user browses over the Internet.  The user can write simple filters/proxylets that modify the content as desired and can easily plug it into the framework. The user can also specify a set of rules under which the different filters/content adaptors get invoked.

In particular, this framework consists of :

  • an ICAP enabled proxy server  (squid, a very popular open source proxy server ),
  • an ICAP server (written in Python),
  • set of proxylets (filters),
  • a proxylet  API (for  authoring new proxylets) and
  • a rule parser (based on IRML, an XML specification).

  •  

     

    SEE ALSO
     

  • Squid ICAP Client
  • IRML authoring

  •  

     

    FURTHER CONTENTS

    1. Technical Background
         The ICAP Protocol
         Proxylet API
         IRML, The Rule Language
    2. About this implementation
       ICAP-server
       ICAP client
       Download Instructions
         Access to our CVS repository
    3. System Requirements
    4. Installation
            ICAP Server
            ICAP Client
    5. Running the ICAP server
    6. References
    7. Contact Info
    8.  Acknowledgements

    This project is sponsored by Hewlett Packard Labs.


    1. Technical Background

    This section provides some basic technical background required to use this software effectively.

    The ICAP Protocol

    ICAP ( Internet Content Adaptation Protocol ) is a protocol aimed at  providing simple object based content vectoring for HTTP services. It  allows ICAP clients to pass HTTP messages to ICAP servers for any transformations.  The server executes its transformation service on  messages and sends back responses to the client, usually with modified messages.  The adapted messages may be either HTTP   requests or  HTTP responses.

    There are two modes in which ICAP works
            o Request Modification (reqmod)
            o Response Modification (respmod)

    In Request Modification, the HTTP request is encapsulated and sent to the ICAP server. The server responds with either a modified request  or the  complete response itself.

    In Response Modification, the complete HTTP response is encapsulated and sent by the edge proxy (ICAP client) to the ICAP server - which then modifies it and sends back a HTTP response.

    The protocol is usually used between an edge proxy server and an ICAP server.  The proxy server sends all the HTTP requests  that it receives for possible adaptation by the ICAP server. If the request was modified by the ICAP server, the new URL is used for fetching the contents for the client. Similarly, before sending the  response to the client device, the proxy sends the response encapsulated as an ICAP  message to the ICAP server and receives back a 'possibly' modified HTTP response - which is then delivered to the client.

    We provide an ICAP server implementation in Python Language and  have enhanced the Squid Web Proxy to include ICAP  client capability.

    For more information on the protocol please refer to [1] and [2].
     

    Proxylet API

    The service that performs the transformation of a HTTP request or HTTP response (also called as content adaptation services here), can be  in two forms:
            (a) Callout services
            (b) Local services called proxylets
    A callout service is a remote service that is used to perform the transformation using an appropriate callout protocol.
    A local service , on the other hand, is executed by the ICAP server. A Java API for developing a local service or a proxylet in Java has been proposed in [3].  Our implementation tries to provide a similar API in Python.
     

    IRML, The Rule Language

    Once we have the client-server pair that understand ICAP, some services that are able to perform certain transformation, there is a  need to specify the conditions under which the transformations need to be performed. A rule based language is proposed for the same in [4] It is an application of XML and is called, IRML (Intermediary Rule Markup Language).

    These rules can be installed by the following three parties:
       1. Client
       2. Access provider (ISP, CDN etc.)
       3. Content provider
    Each party can express the conditions under which they wish to run a service.

    We have provided an implementation of a parser for this IRML language that integrates into our ICAP server.


    2. About this implementation

    ICAP-server:

    This package contains a python based implementation of an ICAP server, IRML parser and proxylet API.   Simple example proxylets (to translate from English to French, to insert banner ads, etc.) are also available as a part of this.

    The ICAP serveris  implemented over the Medusa Python web server.   It is available both in a single and multi- threaded versions. The multi threaded version is, however  not completely tested. Default is single threaded.  Change icap_services.py to make it multi threaded.

    The IRML parser is integrated with ICAP server in a different form here. It is an XML parser that parses a given IRML file and generates python code. This generated python class is used by the ICAP server for rule processing. Hence the rules are statically configurable. Since Python is a dynamic language,  an extension to allow dynamic rule insertion can also be experimented.  See this document  to understand how to edit the rules for this ICAP server

    The language used is as specified in  draft-beck-opes-irml-00.txt  (expires Aug 2001) . Recently, a newer version of IRML language was released  (draft-beck-opes-irml-02.txt) and the changes for that is yet to be implemented.

    The Proxylet API is a python version of the Java API proposed in [3]. Only some most important classes are implemented so far.  Also, only one to one correspondence at method level is maintained.  The arguments to the methods may differ, especially the constructors. It is suggested that the code for sample proxylets be studied before coding new proxylets.

    The following sample proxylets  are available :
    Request Modification, REQMOD::
    (a) Language Translation :  When this proxylet is enabled, accessing a URL that matches the rules (currently ALL)  results in delivering the contents in French!  So, if you set your proxy to an ICAP -enabled proxy server while this proxylet in ON, you end up viewing the whole Web in French!
    (b) A URL redirection service: Currently, this proxylet redirects the URLs to specific ports based on a prefix match of the requested URL, it can be easily enhanced  for a generic redirection.

    Response Modification, RESPMOD:
    (a) Banner insertion : This proxylet modifies all HTML pages to include a particular GIF file in its banner. You may have modify the name and place of the inserted GIF file.

    See the pydoc for some of the modules here.
     

    ICAP client:

    The Squid Proxy Server  was enhanced to include ICAP  client capability, as a part of this project. It is available as a patch to squid-2.4-STABLE2 and squid-HEAD.  A binary installation (gzipped tar) is also available for Redhat Linux 7.1 More info about the Squid ICAP client is available here.
     

    Download Instructions:

    You can download the ICAP server and ICAP client from:
    http://www.sourceforge.net/projects/icap-server
     

    Access to our CVS repository:

    Complete instructions for anonymous CVS  access is available at http://sf.net/cvs/?group_id=47737 . The sources on the CVS are tagged as SF_release_1_0 , SF_release_1_1 and so on, for specific releases at the above Sourceforge site.

    To check out the latest stable snapshot from the CVS, please use the release tag "stable".

    export CVSROOT=pserver:anonymous@cvs.icap-server.sourceforge.net:/cvsroot/icap-server
    cvs co -r stable icap-server squid-icap-client



    3. System Requirements

    I have tested this code only on Red Hat Linux release 7.1 (Seawolf) Kernel 2.4.2-2 on an i586 . Since it is Python and Squid , I do not suspect anything harmful would occur on other compatible platforms.

    You will require the following packages for executing this.


    4. Installation

    ICAP Server:

    1. The ICAP  server is available as a gzipped tar file , so just do
        tar xvfz icap_server.tgz
        It will extract all  the files under the directory named icap_server.
    2.  Modify the file named "setup" to indicate your domain's proxy server (if you are using one). Check if the directories where you installed pyxml are all properly set in PYTHONPATH.
     

    ICAP Client:

    1. Please follow the instructions here for installation squid.
    2. Run  the following command to setup the working directories.
        /usr/local/squid/bin/squid -z

    That's it!

    BTW, you can skip the installation of the ICAP client completely if you want to try  with another ICAP client. Good luck!


    5. Running the ICAP server

    1. Change directory to installation directory of icap server
    2.  Setup path by executing  the "setup" file .
            .    ./setup
    3.  Start the  ICAP server   as
             ./start_icap.py
    4.  Start the squid proxy server (if you haven't done it through inet.d). Ensure that you have set the right icap configuration options in squid.conf file.
             squid
     5. Set your browser to point to squid as proxy  (localhost : 3128). Browse the Web in French!


    6. References

    [1] The ICAP forum at http://www.i-cap.org/
    [2] The OPES forum at http://www.ietf-opes.org/
    [3] "Proxylet Local Execution Environment Local Binding", IETF  Draft. http://www.ietf-opes.org/documents/draft-walker-opes-proxylet-java-binding-01.txt
    [4] IRML: A Rule Specification Language for Intermediary Services, IETF Draft,  http://www.ietf-opes.org/documents/draft-beck-opes-irml-00.txt
    [5] Squid Web Proxy , http://www.squid-cache.org/
    [6] Medusa Server,  http://www.nightmare.com/medusa/
    [7] A python based ICAP server , http://icap-server.sourceforge.net
    [8] Project page, http://sf.net/projects/icap-server
     


    7. Contact Info

    For any comments or feedback on this project, please feel free to write to:
    geetham@india.hp.com
    Geetha Manjunath
    Technical Architect,
    Hewlett Packard India Software Operations Ltd.
    INDIA.


    8.  Acknowledgements

    Special thanks to the co-developers of this project Ralf and Basile for taking this project forward. Ralf has contributed quite a lot to the ICAP client side, by making the Squid ICAP client  nicely configurable and also compatible with ICAP 1.0 .

    I would like to thank Mr. Venkatesh Krishnan, Project Manager, HP Labs,  for allowing me to take up this activity and for providing me the right advice at various  points in the project.  My collegue Devaraj Das has helped me a lot in integrating the code with Squid and getting the whole thing up and running. My manager Anantharaman PN has provided me  the required support and facility to perform my job.
    I am also grateful to SourceForge.net  for  hosting this project.
    Above all, I thank the management of Hewlett Packard Company for having agreed to open this project for public usage and participation.


    Hewlett Packard CompanyThis project is sponsored by Hewlett Packard Labs

    and kindly hosted by BORDER=