Grid and Web 2.0 - thoughts for Discussion |
Thinking out loud about the Grid under the headings of the eight Web 2.0 Design Patterns from What Is Web 2.0.
Small sites make up the bulk of the internet's content; narrow niches make up the bulk of internet's possible applications. Therefore: Leverage customer-self service and algorithmic data management to reach out to the entire web, to the edges and not just the center, to the long tail and not just the head.
Some e-Science involves highly specialist scientists doing very clever things with very big resources, but there are also a lot of other scientists out there doing the daily routine of daily science. Through collaboration and sharing we can help them do better science. Compare this with people taking better photos because of flickr, or videos because of youtube. Through sharing we have the potential to "enable smart scientists to be smarter and propagate their smartness, in turn enabling other scientists to become better and conduct better science" (Carole Goble).
The "long tail" also applies to computational power and reaching out is what happens with a Condor flock, SETI@home or ClimatePrediction.net. A similar observation could be made about data storage and the pervasive sharing enabled by peer-to-peer techniques.
Applications are increasingly data-driven. Therefore: For competitive advantage, seek to own a unique, hard-to-recreate source of data.
When a scientist (or indeed anyone) turns to a web browser to look something up and they don't say "I'm going to use a piece of software called a web browser", nor do they say "I'm going to use HTTP and HTML". Instead they say "I'm going to look that up on the Web". Participation is about data, and the reason people use Web 2.0 APIs is for the content behind them.
Unfortunately Grid tends to divorces services and content - the compute provider is usually there to provide generic compute and storage, not specific content. Note however that Amazon successfully provides compute and storage (the latter through HTTP or BitTorrent).
The key to competitive advantage in internet applications is the extent to which users add their own data to that which you provide. Therefore: Don't restrict your "architecture of participation" to software development. Involve your users both implicitly and explicitly in adding value to your application.
It has to be really easy for users to provide content/storage/compute. The best example in Grids might be the ease of adding a machine to a Condor pool - but it isn't that easy! Peer-to-peer software provides a model for easy and beneficial sharing of storage but is not prevalent in Grid.
Currently Grid portals can be 'clunky' and there's no doubt that they look and feel very different to a "social website". One example of an effort to tackle this directly is myExperiment.
The software development point is interesting - it implies Web 2.0 is all open source and community development. While this is true of mashups by their nature, it isn't true of Web 2.0 sites like MySpace etc - they're not open source (connotea is an exception) and they don't always make their content and services available for reuse. Grid might actually be more Web 2.0 than Web 2.0 in this regard!
Only a small percentage of users will go to the trouble of adding value to your application. Therefore: Set inclusive defaults for aggregating user data as a side-effect of their use of the application.
There is potential through virtual organisations to collect data (social and machine) and use it as a basis for recommendations (to illustrate this at risk of trivialising: "people who used this service/data/software/grid also used..."), but this isn't realised. It's probably all there in the logs!
Intellectual property protection limits re-use and prevents experimentation. Therefore: When benefits come from collective adoption, not private restriction, make sure that barriers to adoption are low. Follow existing standards, and use licenses with as few restrictions as possible. Design for "hackability" and "remixability."
This is important - privacy can deny us the value of sharing. An interesting observation is that some of the new technologies give us a handle on this - especially provenance, so that we can share and know how we have shared. This is a sense in which semantic web techniques help solve a problem - they help you 'free the data'. and also figure out what you've done with it when it's freed! See Creative Commons and Science Commons.
When devices and programs are connected to the internet, applications are no longer software artifacts, they are ongoing services. Therefore: Don't package up new features into monolithic releases, but instead add them on a regular basis as part of the normal user experience. Engage your users as real-time testers, and instrument the service so that you know how people use the new features.
This is interesting, because Grid tends to do the monolithic releases thing. But isn't this really the distinction between the slow spin of robust service provision versus the rapid spin of thin application development? Do we really want perpetual beta services? I don't think Amazon or e-bay would say yes. I think the perpetual beta fits our story in terms of ease of use, user engagement, participation, etc. And at some point maybe you need to push a mashup into the service provision layer - what happens then?
Web 2.0 applications are built of a network of cooperating data services. Therefore: Offer web services interfaces and content syndication, and re-use the data services of others. Support lightweight programming models that allow for loosely-coupled systems.
Grid is very 'command and control' in character. Mashups are all about doing new creative stuff with 3rd party sources (this is the original sense from music). Again one might favour command and control service provision coupled with mashup ease of use. the phrase "loosely coupled systems" is interesting here because we always claim this is what the grid is and yet we forever talk about stacks - we don't yet enjoy the benefits of the loosely coupled system, but Web 2.0 does.
The PC is no longer the only access device for internet applications, and applications that are limited to a single device are less valuable than those that are connected. Therefore: Design your application from the get-go to integrate services across handheld devices, PCs, and internet servers.
People don't do that many mashups on their phones yet! They will.