A framework for Internet Content
Adaptation
Version 1.2.1
October 30, 2002
This open source project gives a complete framework for Internet Content
Adaptation based on the ICAP 1.0 protocol. In short, the framework provides
you a proxy-based solution for online adaptation of the contents that a
user browses over the Internet. The user can write simple filters/proxylets
that modify the content as desired and can easily plug it into the framework.
The user can also specify a set of rules under which the different filters/content
adaptors get invoked.
In particular, this framework consists of :
an ICAP enabled proxy server (squid, a very popular open source proxy
server ),
an ICAP server (written in Python),
set of proxylets (filters),
a proxylet API (for authoring new proxylets) and
a rule parser (based on IRML, an XML specification).
SEE ALSO
Squid ICAP Client
IRML authoring
FURTHER CONTENTS
1. Technical
Background
The
ICAP Protocol
Proxylet
API
IRML,
The Rule Language
2. About this
implementation
ICAP-server
ICAP
client
Download
Instructions
Access
to our CVS repository
3. System Requirements
4. Installation
ICAP Server
ICAP Client
5. Running
the ICAP server
6. References
7. Contact
Info
8. Acknowledgements
This project is sponsored by
Hewlett
Packard Labs.
1. Technical Background
This section provides some basic technical background required to use this
software effectively.
The ICAP Protocol
ICAP ( Internet Content Adaptation Protocol ) is a protocol aimed at
providing simple object based content vectoring for HTTP services. It
allows ICAP clients to pass HTTP messages to ICAP servers for any transformations.
The server executes its transformation service on messages and sends
back responses to the client, usually with modified messages. The
adapted messages may be either HTTP requests or HTTP
responses.
There are two modes in which ICAP works
o Request Modification (reqmod)
o Response Modification
(respmod)
In Request Modification, the HTTP request is encapsulated and sent to
the ICAP server. The server responds with either a modified request
or the complete response itself.
In Response Modification, the complete HTTP response is encapsulated
and sent by the edge proxy (ICAP client) to the ICAP server - which then
modifies it and sends back a HTTP response.
The protocol is usually used between an edge proxy server and an ICAP
server. The proxy server sends all the HTTP requests that it
receives for possible adaptation by the ICAP server. If the request was
modified by the ICAP server, the new URL is used for fetching the contents
for the client. Similarly, before sending the response to the client
device, the proxy sends the response encapsulated as an ICAP message
to the ICAP server and receives back a 'possibly' modified HTTP response
- which is then delivered to the client.
We provide an ICAP server implementation in Python Language and
have enhanced the Squid Web Proxy to include ICAP client capability.
For more information on the protocol please refer to [1] and [2].
Proxylet API
The service that performs the transformation of a HTTP request or HTTP
response (also called as content adaptation services here), can be
in two forms:
(a) Callout services
(b) Local services called
proxylets
A callout service is a remote service that is used to perform the transformation
using an appropriate callout protocol.
A local service , on the other hand, is executed by the ICAP server.
A Java API for developing a local service or a proxylet in Java has been
proposed in [3]. Our implementation tries to provide a similar API
in Python.
IRML, The Rule Language
Once we have the client-server pair that understand ICAP, some services
that are able to perform certain transformation, there is a need
to specify the conditions under which the transformations need to be performed.
A rule based language is proposed for the same in [4] It is an application
of XML and is called, IRML (Intermediary Rule Markup Language).
These rules can be installed by the following three parties:
1. Client
2. Access provider (ISP, CDN etc.)
3. Content provider
Each party can express the conditions under which they wish to run
a service.
We have provided an implementation of a parser for this IRML language
that integrates into our ICAP server.
2. About this implementation
ICAP-server:
This package contains a python based implementation of an ICAP server,
IRML parser and proxylet API. Simple example proxylets (to
translate from English to French, to insert banner ads, etc.) are also
available as a part of this.
The ICAP serveris implemented
over the Medusa Python web server. It is available both in
a single and multi- threaded versions. The multi threaded version is, however
not completely tested. Default is single threaded. Change icap_services.py
to make it multi threaded.
The IRML parser is integrated with
ICAP server in a different form here. It is an XML parser that parses a
given IRML file and generates python code. This generated python class
is used by the ICAP server for rule processing. Hence the rules are statically
configurable. Since Python is a dynamic language, an extension to
allow dynamic rule insertion can also be experimented. See this
document to understand how to edit the rules for this ICAP server
The language used is as specified in draft-beck-opes-irml-00.txt
(expires Aug 2001) . Recently, a newer version of IRML language was released
(draft-beck-opes-irml-02.txt) and the changes for that is yet to be implemented.
The Proxylet API is a python version
of the Java API proposed in [3]. Only some most important classes are implemented
so far. Also, only one to one correspondence at method level is maintained.
The arguments to the methods may differ, especially the constructors. It
is suggested that the code for sample proxylets be studied before coding
new proxylets.
The following sample proxylets
are available :
Request Modification, REQMOD::
(a) Language Translation : When this proxylet is enabled, accessing
a URL that matches the rules (currently ALL) results in delivering
the contents in French! So, if you set your proxy to an ICAP -enabled
proxy server while this proxylet in ON, you end up viewing the whole Web
in French!
(b) A URL redirection service: Currently, this proxylet redirects the
URLs to specific ports based on a prefix match of the requested URL, it
can be easily enhanced for a generic redirection.
Response Modification, RESPMOD:
(a) Banner insertion : This proxylet modifies all HTML pages to include
a particular GIF file in its banner. You may have modify the name and place
of the inserted GIF file.
See the pydoc for some of the modules here.
ICAP client:
The Squid Proxy Server was enhanced to include ICAP client
capability, as a part of this project. It is available as a patch to squid-2.4-STABLE2
and squid-HEAD. A binary installation (gzipped tar) is also available
for Redhat Linux 7.1 More info about the Squid ICAP client is available
here.
Download Instructions:
You can download the ICAP server and ICAP client from:
http://www.sourceforge.net/projects/icap-server
Access to our CVS repository:
Complete instructions for anonymous CVS access is available
at http://sf.net/cvs/?group_id=47737
. The sources on the CVS are tagged as SF_release_1_0 , SF_release_1_1
and so on, for specific releases at the above Sourceforge site.
To check out the latest stable snapshot from the CVS, please use the
release tag "stable".
export CVSROOT=pserver:anonymous@cvs.icap-server.sourceforge.net:/cvsroot/icap-server
cvs co -r stable icap-server squid-icap-client
3. System Requirements
I have tested this code only on Red Hat Linux release 7.1 (Seawolf) Kernel
2.4.2-2 on an i586 . Since it is Python and Squid , I do not suspect anything
harmful would occur on other compatible platforms.
You will require the following packages for executing this.
4. Installation
ICAP Server:
1. The ICAP server is available as a gzipped tar file , so just do
tar xvfz icap_server.tgz
It will extract all the files under the directory
named icap_server.
2. Modify the file named "setup" to indicate your domain's proxy
server (if you are using one). Check if the directories where you installed
pyxml are all properly set in PYTHONPATH.
ICAP Client:
1. Please follow the instructions here
for installation squid.
2. Run the following command to setup the working directories.
/usr/local/squid/bin/squid -z
That's it!
BTW, you can skip the installation of the ICAP client completely if
you want to try with another ICAP client. Good luck!
5. Running the ICAP server
1. Change directory to installation directory of icap server
2. Setup path by executing the "setup" file .
. ./setup
3. Start the ICAP server as
./start_icap.py
4. Start the squid proxy server (if you haven't done it through
inet.d). Ensure that you have set the right icap configuration options
in squid.conf file.
squid
5. Set your browser to point to squid as proxy (localhost
: 3128). Browse the Web in French!
6. References
[1] The ICAP forum at http://www.i-cap.org/
[2] The OPES forum at http://www.ietf-opes.org/
[3] "Proxylet Local Execution Environment Local Binding", IETF
Draft. http://www.ietf-opes.org/documents/draft-walker-opes-proxylet-java-binding-01.txt
[4] IRML: A Rule Specification Language for Intermediary Services,
IETF Draft, http://www.ietf-opes.org/documents/draft-beck-opes-irml-00.txt
[5] Squid Web Proxy , http://www.squid-cache.org/
[6] Medusa Server, http://www.nightmare.com/medusa/
[7] A python based ICAP server , http://icap-server.sourceforge.net
[8] Project page, http://sf.net/projects/icap-server
7. Contact Info
For any comments or feedback on this project, please feel free to write
to:
geetham@india.hp.com
Geetha Manjunath
Technical Architect,
Hewlett Packard India Software Operations Ltd.
INDIA.
8. Acknowledgements
Special thanks to the co-developers of this project Ralf and Basile for
taking this project forward. Ralf has contributed quite a lot to the ICAP
client side, by making the Squid ICAP client nicely configurable
and also compatible with ICAP 1.0 .
I would like to thank Mr. Venkatesh Krishnan, Project Manager, HP Labs,
for allowing me to take up this activity and for providing me the right
advice at various points in the project. My collegue Devaraj
Das has helped me a lot in integrating the code with Squid and getting
the whole thing up and running. My manager Anantharaman PN has provided
me the required support and facility to perform my job.
I am also grateful to SourceForge.net for hosting this
project.
Above all, I thank the management of Hewlett Packard Company for having
agreed to open this project for public usage and participation.
This
project is sponsored by Hewlett Packard Labs
and kindly hosted by