Welcome to my weblog. It is an unconventional BLOG -- more a set of topics I am exploring. I also have abused the name it is a slog as well as a blog. I have decided it is time to try to discipline myself to keep a diary of reflections and ideas that you can look at. This is not a traditonal atom based log where you are free to comment and discuss items. I did put one up and found the results less than satisfactory. I am considering putting another one up with better controls, but for now, I am simply using thsi page manually to keep track of some thoughts.
Should you wish to comment on what I have said, I will be happy to add your comments in verbatim so long as they are not spam. Simply send an email to me at Pitt -- see the main page -- and I will insert it where it references with attribution. Please reference the title and date of the post on which you are commenting.
Social bookmarking systems provide a new source of infomration about resources. In this post, I try to set out some conceptual views of social bookmarking as a mechanism for asking what might be derived from an analysis of social bookmarks. The delicious system works as follows:
With this in mind, at the very least, a social bookmarking system would include a triple that consists of a URLID=normalized URL, a USERID, a DESCription, OPTTag(s), OPTNotes, and SHARE(default TRUE). A conceptual table such as this has the potential to provide the following information:
For users, we can determine the following information
For URLs, we can determine:
Beyond these measures we can examine a number of issues
There are surely many more questions that we might try to answer and there are surely more formal ways of formulating what might be inferred. I will be returning to this entry in the coming months and trying to add more thoughts about this.
I just went back to read the report of a workshop on "Building the Infrastructure for CyberScholarship". The workshop was funded by NSF and sets out a roadmap for research for the next decade or so. The work is solid, but it left me feeling like I sometimes do with my students. Good answer, but the wrong question was asked. Let me be a little clearer about what I mean. The findings of the workshop make a lot of sense, but in some ways they are too driven by a shared blurred vision. Ok, still not clear. A number of workshop participants are notable researchers doing great work in particular areas -- and they have been for a decade or so. I get the sense that as the workshop went on, some of the participants were trying to understand the visions of others so as to prepare a plan for what needs to be done next. The problem is that they were talking about different aspects of a big problem and trying to develop solutions that solved all the problems. This is a situation in which I say to my students, "don't just do something, stand there", which is my second most favorite piece of advice. You guessed it, the first is "don't just stand there, do something". The secret is knowing which to do first.
OK, let me try to say a little bit about what I am thinking. First of all, we should be talking about scholarship, not cyberscholarship. I would hold that while some aspects of computational scholarship change in a digital environment, this is far form the top of the list of what people are talking about here. In talking about scholarship, what are the new opportunities provided? My guess is that there are about a dozen and that segmenting the problem into the component pieces, we have a better chance of building solutions that make sense. Without an effort to be comprehensive, here is my starting list, beginning with the low hanging fruit:
I am just finishing up a doctoral seminar on aggregating annotation information. The premise for the seminar, which was examining various forms of social networking, was to look at how the information that was being aggregated by these sites might be aggregated and used. The motivation for the site was some work being done on a dissertation by one of my students that suggests the bookmarks being aggregated by delicious might serve as a better indicator of resource importance than Larry Page's PageRank. I suggested to the twelve PhD students who took the seminar that we begin with the assumption that everything from a link to a bookmark to a note to tags are forms of annotation. The question then is what benefits or insights can be achieved by aggregating annotations.
The fall out from the course is still occuring, and we have some interesting new projects emerging. One will seek to use tag clouds to develop a new form of topic map of a resource space. The initial ideas are as intriguing as the delicious search ranking project and involve very simple aggregation methods that we are hoping will out perform self organizing maps!
In the seminar, we returned time and again to the question "what exactly are we talking about" as a means of trying to make sense out of the many different things that are going on. For example, we tried to impose an old model I have used for many years on the systems. That model suggests that document or resource systems focus on one of four main categories of activity -- creation, storage, retrieval, and use. Further these processes tend to be different for differnet classes of documents/resources. We have personal, group, organizational, enterprize, and societal resources. When we look at facebook, what are we looking at, or is it not captured in this space.
Last week, we fell into a discussion of various systems and were talking about google versus facebook versus delicous versus flickr. At one point we began to discuss the economics of the new web and discussed the impacts of advertizing and clickthrough. As a part of that discussion, we began to see at least three clearly different kinds of spaces. For lack of better terms, we classed flickr as a destination. We classed google as a waypoint, and we classed facebook as a portal. Without asking whether the examplars were correct, we postulated that a waypoint was a starting point for a search that would take us to a dstination. In contrast, a portal was a specialized waypoint that also served as a destination. Some interesting questions can be asked if we do thsi kind of segmentation. If people use wayspoints as the beginning of a purchase process and destinations as places where they can congregate, what is the value of an ad in each space? There were a number of interesting arguments here which I will return to at some later time.
I like to think about the future, and to do so by thinking about the past. Here is one tiny tid bit to start of a discussion here.
There is evidence that “humans” have been around for a million years or so. It is difficult to pinpoint when spoken language developed as a critical means of communication. Ong holds that social interaction has been occurring for the last 30,000 to 50,000 years. Although it is impossible to date the origin of spoken language, it was clearly millennia before written language. Many scholars refer to this period as the period of the oral tradition.
The oldest deciphered written documents are about 6000 years old. Scholars have identified the development of writing as beginning the literary tradition. The oral tradition, from an information theoretic perspective allowed information to be codified by means of language and memory and passed on from generation to generation. However, the oral tradition was subject to information loss in the reproduction process, and the capacity of human memory was a limiting factor in the amount of information that could be passed on. Further, the transfer required that sender and receiver be collocated in time and space. The literary tradition eliminated the need for collocation, vastly expanded the amount of information, and made a significant improvement in transmission errors, although it left coding and decoding errors.
So what then is immediacy? Keep in mind, the word “immediacy” is suggested merely as a placeholder. Time may well produce a better or more appropriate term, but the concepts that might be associated with “immediacy” are clear, especially those that provide a contrast to prior traditions. There are five ways in which the new form of communication is more immediate than the oral and literary tradition.
First, in the long tradition of communication via the spoken and written word, the communication of information is via an intermediate party. The information gained in not immediate. The information passed from generation to generation via the oral tradition was rich and structured by the orator or storyteller. The tradition and the techniques are fascinating and well beyond the scope of what can be covered here. For our purposes, the key is that story that was passed on was about events as interpreted by the storyteller. The literary tradition has the similar characteristic – it is an interpretation of events presented in a symbolic form. Contrast these presentations of information with the broadcasts of the Hindenberg disaster, the Kennedy inauguration speech, the O.J. Simpson car chase, etc. All of these events are presented without intermediation – they are immediate. Of course, it may be argued that there are interpretations provided by how the video or film was shot, or by how the microphones were placed, or in the case of the network news, how the video is edited or what context it is put in. None-the-less, there is a qualitative difference with which we may be exposed to information. Take the growing presence of webcam on the Internet for viewing public places or traffic flows as other examples of immediacy in the communication of information. I don’t need to be told, or to read, that it is raining in Pittsburgh, I can simply look at the screen and see the rain. Indeed, it is possible to learn that it was raining yesterday or now by connecting the live webcam or its archives from anywhere in the world. This aspect of immediacy relates to our presence to the event.
The second aspect of immediacy relates to the speed with which the information may be disseminated. We now have information floats of seconds where historically the lag between the event and the information about the event was in terms of days or weeks. War and space coverage are examples of this. In the 2003 Iraq war, viewers were able to get a picture of an advancing tank column as it occurred. Indeed, two of the events burned into the memory of 50 year olds in the year 2000 are the funeral of JFK and the landing of a man on the moon. Both of these events received wide and immediate coverage, accepting that there was a 1.32 second delay in the transmission from the moon to earth! This sense of immediacy refers to the temporal nature of the communication.
A third aspect of immediacy has to do with the immediacy of a vast information store to the creator of a message. Many of us are familiar with the process of dragging and dropping information from one place to another on our electronic desktop. Many of the sections of this book have been created by dragging and dropping parts of lecture notes and slide presentations created over the last two decades of teaching and researching in this area. While I grow increasingly concerned about the loss of information created on very early systems or using now defunct information formats, this is, I believe, a temporary phenomenon. With time, we will have immediate access to all of our own information and research so as to more effectively access and convert it into messages. The day is not far away when lectures might be captured as a matter of course. A little, but not much further away, is the time when I will be able to say “That was a great instantiation of the ideas I meant to convey, convert it to written form, insert my diagrams, show the steps I suggested for processes, and animate the two critical development sequences. This is immediacy in information creation.
A fourth aspect of immediacy relates to the receiver’s access to the message as an evolutionary whole. This concept is a little harder to describe than the others because, while important, it is not one that we have seen in practice very much as yet. Historians have a fascination with drafts of important speeches. An edited copy of an inaugural speech or a speech like Lincoln’s Gettysburg Address provides an opportunity to try to interpret the thought process that underlies the communication. Historically, the process of depicting the evolution of a communication has been very hard. For years now, we have had the capability of easily capturing the version tree of a document in process. With time, it will be more common to have access to the complete record of the development of a message. When that occurs, recipients of a communication over space and time will have the ability to “see” a communication evolve in the mind of the sender in ways that we can barely imagine today. The ultimate implication of this aspect of immediacy is likely to be living documents that capture the creator’s efforts and allow the receiver to query not only the document but to speculate with more data about the thought process behind the words.
A fifth aspect of immediacy has to do with the digital nature of the message. That is to say that this communication can be repaired on the fly. Errors normal in the transmission can be detected and corrected immediately. The communication has developed a degree of immunity to the noise in the communications channel. Thus, we can now here a pin drop in a conversation with an individual on the other side of the world. This in not because there is no noise in the communication channel, but because the information in the message can have additional data added that provides a mechanism for correcting the impact of noise. This same digital quality allows the message to be replicated in its bit form at a fraction of the cost of traditional replication. The message can also be encrypted insuring an appropriate level of privacy and security in the communication.
Back around the year 2000, Hewlett-Packard introduced an early version of a web services platform called e-speak. It was a little cumbersome to come to grips with at first, but it became clear that it was a marvelous engine with just the right degree of structure. At a time when the semantic web was still just a glimmer, HP had developed a full blown system to deliver vocbulary development, registration, and discovery. It was a wonderful foray into service oriented architectures, but alas it was lost in the turmoil at HP and the growth of web services.
HP was struggling with just how much of the service architecture was to be a part of the engine and how much was to be provided by the users. In a classic market place gamble, they stripped the engine down to the barest components in the believe that wrapper services could be provided by the market.
At the current time, with the growth of application servers, the withdrawal of the public UDDI servers by IBM and Microsoft, and the slow evolution of the ebXML registries standardized by OASIS, we are in a sort of no man's land where we can't effectively build true marketplaces and compose dynamic applications. It will be back with BPEL and the web service extensions. Less a marketplace approach and more of an application server implementation of standards, but it is coming back. It is just a little frustrating that we were so close seven years ago and now we are in this gray area where we can no longer put the whole package together.
I have been rereading JCR Licklider's 1960 paper on "Man-computer symbiosis." I am also going to read his 1968 paper on "The computer as communication device" (with Bob Taylor). They are facinating reads and rank up with Bush's "As we may think", Engelbart's "A conceptual framework for the augmentation of man's intellect", Simon's "The sciences of the artificial." I could go on to include several others -- Miller, Zuboff, Kay, Weisser -- but you get the main point. What attracts me to all of these people is the strength and unboundedness of their vision. They all identified significant new oportunities and envisioned what might be -- yes a tribute to Michael Dertouzos's "What will be".
So the question I am left with in all this reading is whether what is developing today leads us to new distant visions? What will the current systems evolve to in 40 years. What vision would we like to drive them? Do we become servants to machines? Do we become irrelevant? Are we freed to move on to another level of development?
Technology is developing at a rate that makes it difficult to stay abreast of all the possibilities. Two of the most interesting recent developments are in the area of service oriented architectures and asynchronous javascript and XML. There are exciting and frustrating aspects of both of these technologies and I will be exploring them here over the coming months. Let me begin by talking a little bit about each of them.
AJAX is an exciting new technology that allows us to create an almost seamless interface in the browser which provides the user with an application like experience rather than the jarring page renditions of most web experiences. The cost of this facility is a barrage of exchanges back and forth to the web server that puts a tremendous load on the web server. Further, the technology offers the promise of a push experience but can only do this via the use of polling. There are constant rumors about how AJAX might evolve to allow needed updating without polling, but until that develops, it will remain a small niche technique for select kinds of interface updating.
SOA's are supposed to be the next evolution of distributed systems. Recently we have begun to see commentary that suggests that like so many other paradigms, SOA will be DOA. I think the final evaluation is still out. In the late 90's, HP's espeak presented a significant effort in this direction that dies with the many problems HP faced in the market place. Regrettably, this rather elegant, and complete solution has disappeared. The most recent manifestation of SOA as Web Services has made back a lot of ground, but the implementations tend to be weak on the provision of malleable registries that can be used to discover services located on repositories. Until registries begin to appear, the testing and development of webservices or SOAs will languish.
Accesses since Oct/26/2001:
