mBsLOG
Welcome to my weblog. It is an unconventional blog in that I am not
planning to post daily or weekly, but only as topics of interest emerge.
I enjoyed playing a little with my initials and the word blog and am
amused by the fact that it is as much something I am slogging through as
something I am blogging about. This listing only shows the five most recent
posts.
I will try to
discipline myself to keep a more or less regular set of reflections
coming, but I can't promise. I have disabled commenting and discussion
as it ended up being more maintainence and cleanup than I cared to deal
with.
That doesn't mean your comments and thoughts aren't welcome.
Should you wish to comment on what I have said, I will be happy to
add your comments verbatim so long as they are not spam. Simply
send an email to me at Pitt -- see my
home page. I will insert
it in the appropriate post with attribution if you wish.
Please reference the title
and date of the post on which you are commenting. Also, if you want to
suggest a topic that might be covered or discussed, let me know and I
will try to include it.
Here is access my mBsLOG as an
rss feed.
Sat, 15 Dec 2007
Bookmarks and Meaning(December 15, 2007)
Social bookmarking systems provide a new source of infomration
about resources. In this post, I try to set out some conceptual
views of social bookmarking as a mechanism for asking what might be
derived from an analysis of social bookmarks. The delicious system
works as follows:
- A user posts a url
- To save the URL, the user must describe it -- this could be
defaulted to a title, but it may be more bookmarker centered than page
author centered
- The user may add user notes and tags
- The user may decide not to share the bookmark, making it private
With this in mind, at the very least, a social bookmarking system
would include a triple that consists of a URLID=normalized URL,
a USERID, a DESCription, OPTTag(s), OPTNotes, and SHARE(default TRUE).
A conceptual table such as this has the potential to provide the following
information:
- The number of URL's that have been recorded
- The number of users of the system
- The number of user-URL's that are marked private
- The number of user-URL's that are shared
- The number of URL's that are tagged
- The number of user-URL's that have user notes
For users, we can determine the following information
- The minimum, maximum, average, median number of total, shared, and
private URLs/user
- Various measures of the variance in the total, shared, and
private URLs across users
- The minimum, maximum, average, median number of tags/user
- Various measures of the variance in the number of tags across users
- The minimum, maximum, average, median number of descriptions/user
- Various measures of the variance in the number of descriptions
across users
For URLs, we can determine:
- The minimum, maximum, average, median number of total, shared, and
private users/URL
- Various measures of the variance in the total, shared, and
private URLs across URLs
- The minimum, maximum, average, median number of tags/URL
- Various measures of the variance in the number of tags across
URLs
- The minimum, maximum, average, median number of unique tags/URL
- Various measures of the variance in the number of unique tags
across URLs
Beyond these measures we can examine a number of issues
- Looking at tags, ordered by frequency of occurrence:
- are there obvious groupings of types of tags(semantic,
affective, personal)
- do the most frequently occurring tags tell us anything
about the collection
- are there patterns in the cooccurence of tags -- that is, for
some threshold of frequency of co-occurence across URL's, is there a clear
relationship between the co-occuring terms that allows us to simplify or
clarify the tagging. Does the same hold for low co-occurence terms -- i.e.
can we say some things about the terms.
- Is it possible to develop a tag map that would work as follows:
take the n most frequently occurring terms and set them around the
circumference of a circle. Take any term that co-occurs with one of those
terms more than x%(e.g. 90%) of the time and bundle it with the more
frequently occurring term. (If this was one of the original n, add a new n
to the circle.) Take terms that co-occur 50-90% of the time and place them on
strings proportionally distant from the terms they co-occur with. If they
co-occur with two or three terms on the circle, web them such that they are
proportionally distant from all the terms. If they only occur with one
term, fan them outside the circle proporionally distant from the term. What
kind of term map does that provide -- how might it be improved?
- When we look at tags by users,
- can we identify communities of interest? (common
frequently occurring tags)
- can be identify expertise (high number of URLs with l
evels of commonly used tags)
There are surely many more questions that we might try to answer and
there are surely more formal ways of formulating what might be inferred.
I will be returning to this entry in the coming months and trying to add
more thoughts about this.
[/2007/12]
permanent link
Accesses since January 1, 2007: