Aggregating Annotation Information
(Social Tagging)

Doctoral Seminar

Fall 2007

Michael B. Spring

Monday, 12:00-3:00

IS 502

Introduction

There is increased interest in social tagging (flickr.com) and social book marking systems (del.icio.us). These systems may be viewed in at least three contexts. First, they have been described as a compliment to the semantic web, providing a distributed form of ontology development which might supplement more formal ontology development and classification. We are already beginning to see some convergence in tagging options as more popular tags are recommended. Second, these systems can also be viewed as experiments in the development of social software. Several of the systems that have been introduced have evidenced growth that can be described as viral. What makes some social software so attractive? While the mechanics of viral software are becoming clearer – exposure, invitations, and word of mouth, what people choose to define as valuable enough to invest in is less clear. For example, what is it that moves Linked-in or del.icio.us from inception to critical mass? Third, these systems may be examined in the context of annotation. The phenomenon of annotation has been observed for centuries, but it was been little studied until the 1980s when Marshall and others began to explore the full potential of digital versus physical annotation. This seminar will explore all these contexts. It is my intent to focus most intensely on the third context – annotation with a secondary emphasis on the second context – “virality”. Seminar participants will be welcome to refine and expand the contexts.

Annotation

One taxonomy of annotation divides them into graphical and textual. Graphical annotation would include highlighting, exclamation points or asterisks. Textual annotations could be a simple as numbers in the margin, single words, or summary statements. It is not unreasonable to extend this taxonomy to include actions, like dog-earing a page or creating a link to a passage. Whenever someone creates a link to a particular web page they are making a “statement” about its importance. (We accept the fact that the first link to a page may be an exception to this hypothesis in that its role may be defined as simply positioning the page in the web.) Similarly, whenever an individual bookmarks a page, they are making an assessment of the value or utility of the page. These activities may be classed as a coarse form of annotation, both in granularity and in information. Highlighting a section of a page is more granular, but equally uninformative. Typing a link or tagging a page would be more informative, but at a coarse level of granularity. Finally, a comment associated with a particular fragment of text would be both informative and highly granular.

Google uses incoming links to pages to aid in ranking pages retrieved from a query. The success of Google is testament to the fact that even this coarse form of annotation can be mined. A number of systems have emerged over the last few years that provide access to the aggregate annotation activity of individuals. Flickr uses tags to provide better access to image collections. Del.icio.us uses user bookmarks and the tags users assign to assist in defining sets of bookmarks based on queries.

Given that a system provides the capability to annotate resources at any particular level, we can ask whether such a systems would be enhanced by different forms of annotation. I have suggested that links, bookmarks and tags are forms of annotation. It that the correct superset? Is there a better model? What kind of tagging does the system allow? How easy is it to annotate? How informative and accurate is the annotation? Does it make sense to introduce more refined forms? Are there forms of annotation other than incoming links, bookmarks, page tags, fragment annotation, highlighting, notes?

Given a critical mass of annotations, what are the ways they can be used. What is the taxonomy of actions on the various forms of annotation? What are the benefits to the user? What can the owner gain from the information? Are there secondary users of the information? How can the information be repurposed to query refinement, social analysis, collaborative filtering, knowledge management, team building, data mining, ontology development?

Social Systems

Some systems grow virally, others wither and die. What is the initial user motivation to use a system? What makes some web applications viral? The literature provides some guidance. There have been several analyses of groupware that have tried to explain the popularity of email and the recurring failure of group calendaring software, etc. Does the democratic nature of the software explain what works? Is it missing functionality? Both web mail and social book marking provide ubiquitous computing solutions for mail and browsing applications. As an imap mail user for almost a decade, web mail was both cumbersome and without significant advantage, but for POP mail users, it provided a significant multi-machine advantage. For many users without extensive organizational infrastructures, it provided a valuable free service. Similarly, unless a single laptop is your only machine, web based bookmarks allowed the same bookmarks used at work to be accessed on your home machine. We will examine a collection of systems with an eye to identifying the characteristics of applications that have significant appeal to users both as personal systems and as social systems. We will attempt to identify how the systems move from initial offering to aggregate systems. Of particular interest will be the secondary information streams that allow for the development of additional features that solidify their appeal to users.

Some Initial Readings

Here are a few readings that will serve as a spring board for our early discussions. The Knowledge and Data Engineering Group at the University of Kassel has a number of papers worth looking at (see http://www.kde.cs.uni-kassel.de/pub/index.html).

Andreas Hotho and Robert Jäschke and Christoph Schmitz and Gerd Stumme. Information Retrieval in Folksonomies: Search and Ranking. Proceedings of the 3rd European Semantic Web Conference, 411-426, Springer, Budva, Montenegro, 2006.
( http://www.kde.cs.uni-kassel.de/hotho/pub/2006/seach2006hotho_eswc.pdf)
Andreas Hotho and Robert Jäschke and Christoph Schmitz and Gerd Stumme.Trend Detection in Folksonomies. In Yannis S. Avrithis and Yiannis Kompatsiaris and Steffen Staab and Noel E. O'Connor, editor(s), Proc. First International Conference on Semantics And Digital Media Technology (SAMT), (4306):56-70, Springer,Heidelberg,2006.
(http://www.kde.cs.uni-kassel.de/stumme/papers/2006/hotho2006trend.pdf)

You might also look at the social computing issue of queue magazine Vol. 3 No. 9 - November 2005 and in particular the article on social bookmarking for the enterprise.

· http://acmqueue.com/modules.php?name=Content&pa=list_pages_issues&issue_id=28

· http://acmqueue.com/modules.php?name=Content&pa=showpage&pid=344

It would be worthwhile to develop a little better understanding of folksonomies and tagging. The work of the group at Kassel has some good leads – and here are a couple others

· http://www.hyperorg.com/backissues/joho-jan28-05.html#leaves

· http://www-kasm.nii.ac.jp/papers/takeda/05/ohmukai05iswc.pdf

Finally, again as just a start, try to get a survey of the various systems that exist for social tagging. The April 2005 issue of D-Lib Magazine provides a good overview.

· http://www.dlib.org/dlib/april05/04contents.html

Aggregating Annotation Information (Social Tagging)

Introduction

Annotation

Social Systems

Some Initial Readings

Aggregating Annotation Information
(Social Tagging)