The Semantic Web:
Architectural Patterns for Evolution

IS 3957: Doctoral Seminar: Systems and Technology
IS 2937: Advanced Topics: Systems and Technology

IS 3957/IS 2937
Spring 2003(03-2)
CRN: 46626/38151
Tuesday, 3:00-6:00
Room 406

Michael B. Spring
Department of Information Science and Telecommunications
University of Pittsburgh
spring@imap.pitt.edu

This doctoral seminar, like most doctoral seminars, is intended as a directed but flexible exploration of a topic.  What is found below is a beginning of the exploration, not the exploration in totality.  This course is open as an Advanced Topic course for MSIS students who have had document processing and/or client server systems.  Both PhD and MSIS students should anticipate working with the instructor to define additional readings and work related to the course.

Introduction

This seminar will address the development of the semantic web.  The term “semantic web” appears to have been coined by Tim Berners-Lee.  The first reference I can find is from his book “Weaving the Web”.  Since that time, both the concept he put forward and the term have been used in a variety of different ways. At the very least, the semantic web alludes to the fact that new methods of organizing the resources on the web will be required, that agents as well as humans will be browsing the web, and that the resources on the web will increasingly be dynamic programs rather than static files.

This seminar will explore the technologies, concepts and current research on building systems that offer a more semantic web.  The seminar will be focused in part by the interests of the students taking the seminar, which will be open to advanced masters students as well as PhD students.  What is clear is that the semantic web will somehow contain more “information” and that that information will be of a form that will enable algorithmic as well as human processing of information.  Whatever eventually evolves, some things about the web are already clear:

1.     HTML documents and the http protocol have been widely accepted and represent an important base upon which anything new will be built.  It is most likely that the semantic web will use successors to the current array of clients and servers that can handle the web as currently configured as well as new forms of resources that may add additional functionality.  Put more simply, the Semantic Web will supplement, not supplant, the Simple Web.

2.     Business will play a greater and greater role in shaping the web and its capabilities.  While the web was initially dominated by academics and altruistic information sharing, the web of the future will be increasingly devoted to commerce of a wide variety of forms.

3.     The amount of information on the web continues to grow.  The reality is that individual link traversal is a lousy way to find information.  Classificatory libraries (yahoo) and full text indexing (search engines) have emerged as the dominant mechanisms for locating resources.  Both of these approaches, expert classification and text indexing, have significant limitations.  Much of what the semantic web efforts are directed at is the design of systems that will not be subject to these limitations.

4.     The resources that are available on the web are increasingly opaque and/or transitory.  Opaque resources are programs that generate output based on some input.  It is meaningless to full text index such a resource – the real content is opaque to search engines.  Similarly, some data that is generated as a part of a resource is highly transitory, so even if it is indexed, it is not likely to be the same when accessed at a later time.  It is believed that “semanticizing” these resources would help to overcome this limitation.

5.     Distributed computing in its many forms is growing in popularity and it is only natural to expect that some form of distributed computing also makes sense for the future of the web.  There is a desire to create information stores on the web that are standard enough to be processed by programs.

6.     The development of the XML family of standards has resulted in a fabulously rich conceptual infrastructure for the development of new technologies, tools, and capabilities.

The seminar will explore how these various goals and trends might be realized.  Within this context, the participants will be expected to develop well reasoned position papers and prototype implementations.

Goals

The goals of the seminar are:

  1. to provide an introduction to the literature on the topic.
  2. to provide an introduction to the relevant technologies and languages.
  3. to provide an opportunity for the participants to engage in a discussion.
  4. to establish a framework for future research in this area.
  5. to develop a meaningful set of implementations that demonstrate a more semantic web.

Organization

The seminar will be broken down into three parts:

  1. Orientation and Proposal (weeks 1-4)
    1. The participants will read “Weaving the Web”, “What Will Be” and a series of articles on various aspects of the Semantic Web.  By the end of the third week, the participants in the seminar should have formed some more operational definitions and some requirements for the next generation web.
    2. During this same period, the seminar leader will introduce the notion of an information marketplace and will lead an effort to refine an existing proposal to NSF to develop a collaboration infrastructure.
  2. Technologies and Methodologies (weeks 4-8)
    1. Key technologies that will underlie the next generation web will be explored.  This will focus on a thorough review of XML, certificates, and distributed computing.  The seminar leader will be responsible for presenting this material, and the participants will be expected to wade through the various standards and specifications.
    2. The participants will work with the instructor to identify the nature of the design problem for the next generation or semantic web.  Is it an enterprise application, a business framework, an information marketplace, an API? How do these various systems differ?  What are the specifics of the design suggested by a collaboration infrastructure.
  3. Positions, Prototypes, Plans, and Problems (weeks9-15)
    1. The remainder of the seminar will be devoted to development of prototype designs and participant-led discussions related to these prototypes. It is expected that each participant PhD student will have selected a narrow focus within the area for investigation and presentation. 
    2. Advanced masters students taking the special topics course will work with the PhD student of their choice in implementation of a prototype.
    3. PhD students will be responsible for collecting and distributing additional readings related to the focus of their work and for guiding a discussion.
    4. The participants will use this period to demonstrate and discuss the software modules developed. These sessions will focus on walkthroughs of the projects and suggestions for improvement and testing.

Outcomes

There are four expected outcomes expected in this seminar for PhD students:

  1. Each participant is expected to develop a contract that defines their goals for the seminar.  This will include some preliminary statement about the specifics of the next three points in their case.  It will also serve as a preliminary statement by them about how in general they will contribute to the seminar, what they expect the seminar to accomplish for them, and what kinds of expertise they are able to offer the other participants.
  2. Each participant is expected to develop an overview of the literature in the area to the point where they will be able to identify several additional papers in the area, read and digest those papers, and guide a class discussion of the papers.
  3. Each participant is expected to write a review of the literature that begins with the papers discussed in the class and continues on to other relevant papers in some specialized area of the participant’s choice. The focus of the review should be to raise an issue or make a point about what should be possible or might be done by way of further research in this area.
  4. Each participant is expected to develop a design for a software system.

There are two expected outcomes of this course for advanced master’s students:

  1. Each participant is expected to contribute to a website that will be developed during the seminar.  Each Master’s student is expected to contribute to this site.  Two contributions will be required.  The first will be a condensation of the readings and the discussion in class based on those readings.  These will constitute 5-15 pages and will both condense the readings and discussion and expand upon it.  It is anticipated that students will explain how the readings are related, give examples and develop simulations as necessary to explain the ideas.
  2. Each participant is expected to contribute to the development of a software system that provides a prototype example of some aspect of the next generation web.

 

Preliminary Reading List

·       Berners-Lee, Tim, James Hendler and Ora Lassila, The Semantic Web. Scientific American, May, 2001, 35-44.

·       Tim Berners-Lee, Weaving the Web. Harper, 1999.

·       Michael Dertouzous, What Will Be. Harper, 1997.

·       Steven M. Cherry, "Weaving A Web of Ideas. Engines that search for meaning rather than words will make the Web more manageable."In IEEE Spectrum Online (September 2002).

·       Uche Ogbuji, "The Languages of the Semantic Web." By. In New Architect Volume 7, Issue 6 (June 2002), pages 30-33.

·       Jim Hendler and Brian Parsia, "XML and the Semantic Web. It's Time to Stop Squabbling -- They're Not Incompatible." In XML Journal Volume 03, Issue 10 (October 2002).

·       Sandro Hawk,e How the Semantic Web Works http://www.w3.org/2002/03/semweb/

·       Aaron Swartz, The Semantic Web In Breadth http://logicerror.com/semanticWeb-long

·       The Semantic Web: An Introduction http://infomesh.net/2001/swintro/