Genomics Blog

March 16, 2010 9:15 AM
Interview with Paul Gordon on Semantic Web Technologies
Filed Under: Bioinformatics


guest post from Susanne Cardwell
Administrative Coordinator 
Bioinformatics Platform Applied Computational Genomics Course

  Paul Gordon, the Bionformatics specialist for the Sun Center of Excellence for Visual Genomics, gave the following description of Semantic Web Technologies and how they relate to the programs he is developing called Daggoo and Seahawk:

  “In a nutshell,” says Paul Gordon, “Semantic Web technologies are about using URLs instead of words to refer to concepts.” He says that the advantage is that URLs (i.e., Web addresses like http://...) are unambiguous – it’s easier for computers to use URLs as computers have historically had problems with interpreting natural language. He states that the reason you want to use URLs in this capacity is so that the computer can surf the web for you instead of you manually trying to find answers on the web. “In short, it is about having a web of data instead of a web of documents,” says Gordon. One major problem is how to shoehorn the current Web into this Semantic model, and this is his primary focus. 

  Paul states that the Semantic Web Technology he’s working on are called “Seahawk” and “Daggoo.” By using Semantic Web Technology, with these programs you can demonstrate to the computer how to query and then extract the data from a website (e.g., through filling in a Web form query). The programs will then in the future be able to automatically extract this data for you. This is called Programming by Demonstration. The computer will automatically process the query and extract the data the next time it is called upon.

  He further describes the programs he is developing and/or working with:

  Daggoo is the thing that understands the Web forms, and Seahawk translates the demonstration into a Taverna workflow. Taverna is a visual programming environment, where, instead of using complex programming syntax, you manipulate images on a screen. This is a more intuitive way to program for users who are used to point and click interfaces.

  “Basically, in Seahawk, your demonstration gets translated into a Taverna visual program that you can use on large datasets rather than just on the single example. The computer can iterate the same type of analysis on larger datasets,” says Paul and that a biologist can retrieve from Seahawk/Daggoo any type of information they can currently access on the Web.

  Providing an example, Gordon states that the extrapolation to other websites is based on having rules that recognize biological datatypes in text. That specific type of text can be converted into unambiguous URLs. For instance, the example “1.1.1.1 “ would denote the E.C. number for the enzyme Alcohol dehyrdrogenase. This example would work, like recognizing the pattern of a telephone number, given some model of what a telephone number looks like.

  “The computer trying to recognize words, however, is called natural language processing, and is a domain of study on its own,” says Gordon and he offered two current examples of this recognition technique in other software:

  • New versions of Internet Explorer have telephone numbers with “Skype” formatting, indicating that you can “skype” phone that person with a click of the button. 
  • If you highlight an address in Internet Explorer, there is an icon that allows you to do other tasks like map the address that you highlighted.

  “So, there is more than you can do with text than simply read it,” Gordon says. 
   
  This increased functionality of data is at the heart of Semantic Web Technologies.
  For more information, please see www.daggoo.net

Comments

Name
URL (remove the http://)
Email
Comments