Biomedical informatics researchers at IBM and the Mayo Clinic have launched a new open-source consortium focused on natural language processing (NLP), in an effort to help doctors share diagnosis and treatment information.
The Open Health Natural Language Processing Consortium, announced Thursday, will focus on technology to allow for large-scale data aggregation, allowing doctors to mine medical records in their specialties to find similar cases to study before making difficult diagnoses or before determining treatment.
Doctors will be able to review any physician notes on similar cases, but no personally identifiable patient information will be available in the database, IBM and Mayo said.
With the launch of the consortium, the two organizations have released two projects under open-source licenses, one focused on clinical notes and one on pathology reports. The consortium is using the Apache license, version 2.0.
The organizations are inviting others to help develop NLP tools.
"By making it an open-source initiative, we hope to enable wide use of these NLP tools so medical advancements can happen faster and more efficiently," Dr. Christopher Chute, a Mayo Clinic bioinformatics expert and senior consultant on the project, said in a statement.
Two other health care organizations, Seattle Group Health and the U.S. Department of Veterans Affairs Boston Healthcare System, plan to participate in the consortium, and other participants are welcome, IBM and Mayo said.
As more health care providers adopt electronic health records, it will become increasingly important to be able to search those records, the organizations said.
Mayo and IBM have developed a system for extracting information from more than 25 million text-based clinical notes based on IBM's open-source Unstructured Information Management Architecture, or UIMA, they said.
The two organizations have also developed a system to extract cancer diseases characteristics from pathology reports, allowing for the computation of cancer stage.
"Large-scale information extraction from the clinical narrative is a vital component in advancing translational research and patient care," Guergana Savova, a medical informatics specialist and Mayo's lead on the project, said in a statement.
"It 'unlocks' the clinical textual data that resides in huge repositories. Such technology would allow for large-scale data aggregation, analyses and usage -- just imagine the power of data from millions of patients."
The organizations have not yet determined what NLP projects to work on next, an IBM spokeswoman said. "The goal is to first get feedback from participating institutions on the initial project, and then expand," she said.