Genomics Blog
guest post from Susanne Cardwell
Administrative Coordinator
Bioinformatics Platform Applied Computational Genomics Course
Paul Gordon, the Bionformatics specialist for the Sun Center of Excellence for Visual Genomics, gave the following description of Semantic Web Technologies and how they relate to the programs he is developing called Daggoo and Seahawk:
“In a nutshell,” says Paul Gordon, “Semantic Web technologies are about using URLs instead of words to refer to concepts.” He says that the advantage is that URLs (i.e., Web addresses like http://...) are unambiguous – it’s easier for computers to use URLs as computers have historically had problems with interpreting natural language. He states that the reason you want to use URLs in this capacity is so that the computer can surf the web for you instead of you manually trying to find answers on the web. “In short, it is about having a web of data instead of a web of documents,” says Gordon. One major problem is how to shoehorn the current Web into this Semantic model, and this is his primary focus.
Paul states that the Semantic Web Technology he’s working on are called “Seahawk” and “Daggoo.” By using Semantic Web Technology, with these programs you can demonstrate to the computer how to query and then extract the data from a website (e.g., through filling in a Web form query). The programs will then in the future be able to automatically extract this data for you. This is called Programming by Demonstration. The computer will automatically process the query and extract the data the next time it is called upon.
He further describes the programs he is developing and/or working with:
Daggoo is the thing that understands the Web forms, and Seahawk translates the demonstration into a Taverna workflow. Taverna is a visual programming environment, where, instead of using complex programming syntax, you manipulate images on a screen. This is a more intuitive way to program for users who are used to point and click interfaces.
“Basically, in Seahawk, your demonstration gets translated into a Taverna visual program that you can use on large datasets rather than just on the single example. The computer can iterate the same type of analysis on larger datasets,” says Paul and that a biologist can retrieve from Seahawk/Daggoo any type of information they can currently access on the Web.
Providing an example, Gordon states that the extrapolation to other websites is based on having rules that recognize biological datatypes in text. That specific type of text can be converted into unambiguous URLs. For instance, the example “1.1.1.1 “ would denote the E.C. number for the enzyme Alcohol dehyrdrogenase. This example would work, like recognizing the pattern of a telephone number, given some model of what a telephone number looks like.
“The computer trying to recognize words, however, is called natural language processing, and is a domain of study on its own,” says Gordon and he offered two current examples of this recognition technique in other software:
- New versions of Internet Explorer have telephone numbers with “Skype” formatting, indicating that you can “skype” phone that person with a click of the button.
- If you highlight an address in Internet Explorer, there is an icon that allows you to do other tasks like map the address that you highlighted.
“So, there is more than you can do with text than simply read it,” Gordon says.
This increased functionality of data is at the heart of Semantic Web Technologies.
For more information, please see www.daggoo.net
Categories
Blog Roll
Archive
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
September 2007
August 2007




Comments