In June 2000, US President Bill Clinton and British Prime Minister Tony Blair unveiled what amounted to a "rough draft" of the deciphered complex makeup of human genetics, an essential milestone to cracking the genetic code that makes up human life.
Work on the mapping of the human genome, whose completion was announced in April 2003, was heavily dependent on advanced computing for the data intensive task of mapping the sequence of 3 billion base gene pairs in humans.
Ironically, getting that genetic data into the hands of biomedical researchers has created another major computer quandary: the need for even more advanced systems that can keep up with an increasing number of disease subcategories being discovered through genetic research.
The National Cancer Institute in the US took on the task of addressing that issue in 2003 by launching what it calls the largest IT project in the history of biomedical research. The NCI created what is, in essence, a World Wide Web of cancer research.
The new Cancer Biomedical Informatics Grid, or caBIG, promises to help researchers, physicians and patients across the country to better share more-detailed information about diseases and thus speed the development of new drugs and treatments for them.
The government-funded effort cost about US$20 million a year, the NCI said.
To date, 42 of the institute's 63 national cancer centers are either linked to the caBIG grid or are installing the necessary infrastructure to participate. Many are already building applications that can be shared by members of the grid.
The need for wider data sharing became obvious as genetics research found more sub-categories of cancers that would require specific treatment methods.
Traditionally, cancer researchers focused on studying a relatively small number of disease categories, such as lung cancer, breast cancer or colon cancer. But as the genome work expanded, many disease subtypes were discovered within those categories, and each may require a different treatment.
Cancer researchers quickly saw the need to assemble as much information as possible to help in the development of new disease-specific treatment options. So, to broaden the number of data sources, the NCI has begun expanding the grid to include the community hospitals and physicians that treat 80 per cent of US cancer patients.
Project backers said that researchers decided early on to focus on improving interoperability rather than forcing research organizations to standardize on expensive new IT systems and software.
To accomplish that, the developers used the Globus Toolkit, a set of open source tools for building grid systems and applications that run on top of Web services that are open for anyone with a node on the system. The Globus tools are distributed by the Globus Alliance.
Developers also created a collection of tools that serve up semantic descriptions of vocabulary and data so that both humans and machines can interpret data from dissimilar systems. And a common security model was built to allow research centers to run caBIG as a distributed infrastructure that lets each participant create individual policies to determine who can author or access data.
In addition, Ken Beutow, director of NCI's Center for Bioinformatics, said the NCI has set up "workspaces" -- groups of people that meet regularly to discuss specific domains of work like tissue banks and pathology tools. The workspace groups provided input on building the common vocabularies and data elements, he noted.
Robert Annechiarico, director of cancer center information systems at America's Duke University, which has already helped build applications for the grid, said that creating the common data elements is especially important in the academic world. "Academic medical centers," he noted, "are a community of fiefdoms bound together by a common parking problem."