Join us for the San Francisco Net Tuesday on September 9:
Involver: How Nonprofits Can Create Video Campaigns for Social Networks.
Stuart Weibel is a Consulting Research Scientist at the Online Computer Library Center (OCLC). OCLC is a worldwide library co-operative and the providers of the WorldCat (world catalog) service. In addition to working at OCLC for 20 years, Stuart spent 10 years leading the Dublin Core Metadata Initiative. Stuart is on sabbatical for calendar year 2006 at the iSchool at the University of Washington. Stuart is especially interested in social software models, the Web 2.0 movement and its implications for so-called Library 2.0 thinking.
Stuart and I began our conversation by talking about the concept of metadata - which Stuart defines as data about data. He emphasized the need for standards, making an analogy with railroads that can only be effectively traversed if the rails are the same gauge everywhere the train will travel. We talked about different metadata standards emerging, including within the context of the Open Access movement, but I wanted to ask him about general web use for nonprofits in particular.
Marshall:
Are there metadata best practices that are particularly forward looking in the field of Web site design?
Stuart:
The whole issue of Web design and metadata is still pretty wide open. One school argues that the web site is, in a sense, self documenting - that free text indexing does what we need. There is good evidence of that; we find what we want on the web a lot of the time.
The counter argument is that free text is fine, but structuring that indexed text is quite important. We see that happening with a lot of new web 2.0 tools... inferring structure from the data, and then using that structure to categorize or organize information.
The other thing we see more and more of is the major search engines looking for high-quality, well organized and managed assets (data from libraries, museums, government repositories). This suggests that having metadata gives you a leg up with such material.
OCLC, the non-profit for which I work, does this by providing a version of our library's collected data for harvesting, which then is represented on the open web as Find in a Library: Book Title [Try searching in any major search engine using this protocol and you will see that your first result will always be your book title in the OCLC WorldCat database.]
The marketing slogan we use, "weaving libraries into the web," is particularly apt in this case. We're taking our data to the user, where they want it - on their screen. We think this is an important step in keeping library services relevant to our constituents
Marshall:
Should the expectations of searchers be changing accordingly?
Stuart:
Searching expectations are changing very rapidly. Not always in ways we can predict or keep up with. This is one of the big challenges for everyone (witness the intense competition among Google, Yahoo, MS, and others).
The library community needs to be integrated in this, but not a competitor. We have a neutral business model, making things look free to constituents. Bookstores and libraries don't compete head to head, they are complementary. The same needs to be true on the web. What we hope to do is to be on the same page, so to say... click to buy the book... click to borrow it. The searcher has the easiest job... kick back and watch the players scramble for their "business."
One resource that Stuart pointed out was a decade old service from OCLC called PURL.org. People interested in maintaining the findability of a resource over time, even if that resource changes URLs can use PURL to create a Persistent URL. A PURL can be updated by its owner to redirect to whatever URL a resource happens to be at at the time.
Beyond URLs, Stuart said that there is a growing trend for resources to gain value by being surrounded by a growing cloud of relationship-describing metadata. How that metadata should be structured, how the relationships can be defined for the long term, was what we discussed next.
I asked if Extensible Markup Language (XML) is the best way for the world’s resources and relationships to be described online. XML is a very open structure, within which any type of relationship can be described using any arbitrary name. So long as the same arbitrary name is used later, two parties to any (human or computer) communication will know what is being discussed. Stuart said he thinks that more structure is needed for the giant classification task at hand and points to RDF instead.
Perhaps more important, we discussed how the line between structure and chaos will be negotiated if the classification of information in the Web 2.0 world will happen in a hybrid between top-down and bottom-up approaches.
Marshall:
Can we XML-ify the world?
Stuart:
Well, lots of folks are betting we can... XML is a great tool for structuring strongly typed data. But by itself, it probably isnot tightly enough constrained to achieve the interoperability we want. There are too many flavors. RDF provides a set of constraints layered on top of XML that are intended to make it easier to use and reuse metadata. Other semantic web tools such as OWL are intended to make it easier to build ontologies that can further advance the expression of semantics on the web. But the fact that uptake of these tools has been tortuous suggests that we're either getting ahead of ourselves, trying to answer unasked questions. Perhaps it is simply a recognition that interoperable specification of semantics is very hard, and we shouldn't be surprised that it is tough going.
Marshall:
Or that people don't want that much structure and instead aim to achieve the same ends through decentralized self organization?
Stuart:
Yes, the self-organization model is very appealing...but even in that model, if you expect interoperation to take place, there have to be conventions for syntactic expression of semantics. That part is not emergent, I don't think. We have to have shared grammars, even if our vocabularies are fluid and emergent.
Marshall:
Why hasn't that much happened already?
Stuart:
Hmmmm.
Marshall:
Is it really such a paradigm shift underway that the very grammar is changing?
Stuart:
No... the grammar should be... can be ... stable. Instead of answering the question, I'll propose a thought experiment.
20 years ago, the AI [Artificial Intelligence] "revolution" was underway and that lapsed into so-called AI winter - disappointed expectations.
Why would we imagine that anything is different today? A lot of the Semantic Web stuff is right out of the AI stuff of two decades ago. Are we any smarter? No, but there are two huge differences. One is our computational capabilities are many orders of magnitude beyond what we had, but even more importantly - we have something to compute on. Everything new is born digital. So there is this huge corpus to work on and the entrepreneurs are doing it. Wondrous things like google maps/earth... delicious... flickr...all the great stuff that is coming out to promote collaboration and exchange and sharing.
So we can reasonably expect real advances, but we'll still need conventions. The railroad tracks have to be the same gauge.
Marshall:
Are those examples of things that are supporting the semantic web or a folksonomic web?
Stuart:
I don’t see the difference, a folksonomic web IS a semantic web. With a small "s" perhaps. They are technologies in support of shared meaning.
Marshall:
Makes sense.
Stuart:
One is emergent... the other more top-down. My guess is that it will be a hybrid. There are values in formal taxonomies. There is probably value in emergent folksonomies. What if they are married?
Marshall:
I just wonder where that line will be negotiated.
Stuart:
In the hearts and minds of users ;-)
Marshall:
Is the library space one where the line is negotiable?
Stuart:
I think so, if we're smart enough. OCLC owns the Dewey Decimal system - still one of the most successful knowledge organization tools in existence and one of the strongest library brands.
Marshall:
Combine that with user generated content, annotation, reviews, metadata and let users search a database that includes both...
Stuart:
One of the concepts in the Dewey of old is that of warrant. A term doesn't get incorporated until it shows up in the real world often enough, and authoritatively enough. What if that evolves towards a kind of digital warrant?
Marshall:
That seems fundamentally unlike the long tail model.
Stuart:
So that new terms emerge, are mapped onto a stable skeleton, based on electronic warrant that might not even have to be mediated by people?
Marshall:
A hybrid.
Stuart:
Yes, taking the value of a stable legacy system and marrying it with an emergent dynamic of self organizing and evolving language usage?
I should say, I don't know how to do this, I offer it simply as a thought experiment that might capitalize on inherent values of two kinds [of approaches].
Marshall:
Could this look like agreeing to designate metadata as written in an agreed upon microformat system or in XML, but being free to diverge after that? I just worry about where the line gets drawn.
I don't mean to be too argumentative, but I know that there are so many categories in the world that are contested in and of themselves - like race and gender.
Stuart:
well, I don't really know where that line gets drawn... but I think the idea of emergence means that I, specifically, don't have to!
Marshall:
Well three cheers for that.
Stuart’s blog, Weibel Lines, was recently named one of the Top Ten librarian blogs by Top Ten Sources. You can subscribe to a bundle of feeds in OPML format below, including Stuart's blog, all items tagged Library2.0 in del.icio.us and future interviews here at Net Squared.
To subscribe to all of the above, you can import the URL behind this link into your feed reader: WeibelNetSquared
You can preview the feeds included in the WeibelNetSquared file by clicking through the box below. The left border will move you up a level.