An API for the Web (from late 90's)

[NOTE: This proposal was posted here some time in the late 90's (original URL: web.html). Over a dozen years later, the only things that come close are little-used cluttered messes.]

Let's say you want to get the copyright information from several web sites. Most companies put this information on the fine print on their main page, but some have it another page, such as an about.html or info.html or whatever page.

The Old Way

Getting accurate copyright information programatically would be challenging. Even if you could be assured that a company's copyright information will always be in the fine print on their main page, here's what you'd need to do:

create a URLConnection with "http://www.ibm.com"
read that HTML file
parse the HTML file
  look for "copyright" followed by a year

The above less-than-robust method illustrates one of the fundamental problems with the WWW: even if IBM's home page is generated by 1MB of servlets talking to 10MB of EJBs talking to a million dollar DB, the client side model of the WWW is more or less as a collection of static text files.

The New Way

Now, consider this alternative:

Each web site is an object. In other words, you can retrieve (from some central repository) the object representing IBM's web site, and call methods on that object.

For example:


class WebSiteFactory {
  IWebSite createWebSite( IWebSiteSpecifier wss );
}

interface IWebSite {
  ICopyrightInfo getCopyrightInfo();
}

main() {
  IWebSiteSpecifier  ibmAddress, yahooAddress;
  IWebSite           ibmWebSite, yahooWebSite;

  ibmAddress = new IWebSiteSpecifier( "IBM" );
  yahooAddress = new IWebSiteSpecifier( "Yahoo" );

  ibmWebSite = WebSiteFactory.createWebSite( ibmAddress );
  yahooWebSite = WebSiteFactory.createWebSite( yahooAddress );

  System.out.println( "IBM's copyright date is " +
               ibmWebSite.getCopyrightInfo().getDate() );

  System.out.println( "Yahoo's copyright date is " +
               yahooWebSite.getCopyrightInfo().getDate() );
}

Viewing a Site

In addition to getting meta-information like the copyright date, the IWebSite interface would have the getViewer method:

interface IWebSite {
  JComponent getViewer();
  ICopyrightInfo getCopyrightInfo();
}

The web browser would become just a simple container. To show IBM's web site to the user, you'd call getViewer, and add the returned component to your web browser's frame.

No HTML parsing, no HTTP, no javascript, no cookies, no nothing.

IBM's component would display itself, and your web browser would simply host their component. IBM's component would load data and classes as needed. With a good set of GUI and other client classes, and a good resource format, IBM's web site could load almost as fast as an HTML page, but with much, much more functionality.

When the user wants to go to Yahoo's web site, the IBM component would be removed from the browser container, and Yahoo's web site's component would be added.

Data Models

Sites might choose to expose their underlying data models, perhaps for free, or for an additonal fee.

For instance, an app you write could programmatically explore the Yahoo hierarchy.

Or, an app you write could programmatically get a list of all the downloads available at IBM's Alphaworks, synchronize those downloads with software you've previously downloaded, and ask to be informed when new software is available.

When Alphaworks adds a new download, it informs all listeners of the new download, including your app. Your app would then arrange the download of the new software, all without user intervention, and all without parsing HTML or other hacks.

FAQ

  • Aren't there things out there like this already, or in the planning stages?
    Most likely. But, I think they're either a) carrying HTTP/HTML/etc. baggage, or b) specialized systems not designed for wide use on the Internet as a whole.

  • Would this be based on RMI?
    However this is implemented, it would have to be in such a way that it could be accessed from all programming languages. That might rule out RMI in favor of Corba/IIOP.

  • Aren't text-based, human-readable formats best? One problem with binary formats is that a certain company (you know who!) can change the format to fit their own agenda.
    A text-based format is just a specialized form of a binary format using a restricted set of bytes. If you look at a text format in a binary editor, it's just another binary format, right? Both formats can be modified to suit corporate agendas. Both formats can also be standardized.

    A text-based format has many disadvantages, and only one advantage, that it can be read by a human. Perhaps the best solution is to use a binary format for machine communication, and provide a compiler of some kind to convert the text format into its binary representation. This is how most programming languages work already, right? Those who wish to use a GUI tool to edit the binary format will not have to deal with the text format.

  • Isn't HTML/HTTP so deeply entrenched that this could never succeed?
    If you provide a profit motivation, this has a chance. That is, if a company can see that they'll make more money this way than just using HTML, they might give it a chance. After a certain period, this might achieve critical mass.

  • Would this completely replace HTML/HTTP/etc.?
    Unfortunately, there a browsers like Lynx which would require HTML. But, HTML/HTTP/etc. would definitely be "legacy".

  • Wouldn't this require everyone who wants to put up a website to be a hardcore nerd?
    Not everyone who has a website entered raw HTML into Notepad.

    There are many GUI HTML editors available, such as VisualCafe and the site builders available online from Geocities, etc.

    Just because most Java GUI builders are not that good doesn't mean they can't get better if there's a lot of demand.

  • What is your role in this project?
    When I have the time, I might upload a more complete specification for public comment. I might release it as open source or public domain. I don't see a way to make money off the underlying technology itself, but perhaps off builders, servers, etc.

(NOTE: IBM and Yahoo are just used as examples, and have no connection with this page.)