Interdisciplinary team of researchers will work together in new Big Data to Knowledge Center of Excellence to create powerful computational tools
Today’s researchers, working with the advantages of new, sophisticated laboratory technology, have unleashed a river of valuable biomedical data — much more, in fact, than many of them have the tools to properly analyze, or the capacity to store. In 2012, the National Institutes of Health created the Big Data to Knowledge (BD2K) initiative to enable efforts to harness the potential of this flood of information. As part of the first wave of BD2K funding, the University of Illinois at Urbana-Champaign and Mayo Clinic have now received a $9.34M, four-year award to create one of several new Centers of Excellence for Big Data Computing.
The NIH initiative encompasses a broad range of “big data” types, including collections of high-resolution research images and real-time recordings of complex biological phenomena. The Illinois-Mayo Center, to be located on the Urbana-Champaign campus, will focus on the analytical challenges posed by the rapidly growing body of genomic and transcriptomic data produced by genome-wide, high-throughput experimental technologies.
The Center’s research goal is to create a revolutionary analytical tool that allows any biomedical researcher to place a gene-based data set in the context of “community knowledge” — the entire body of previously published gene-related data. This broad context for individual data sets will offer new functional insights for the genes being studied. The proposed Knowledge Engine for Genomics, or KnowEnG, will be unique in its integration of many disparate sources of gene-related data to increase its analytical power, as well as in its planned scalability. The tool will be designed to accommodate the continued growth of genomic community knowledge and the increasing computational infrastructure required to work with genomic data.
To create KnowEnG, the Center will combine the expertise of many units across the UI campus, including the Institute for Genomic Biology (IGB), the Department of Computer Science, the Coordinated Science Laboratory, the College of Engineering, and the National Center for Supercomputing Applications (NCSA). As a leader in biomedical research and structured data collection, Mayo Clinic will play a vital role in design, testing and refinement.
Jiawei Han, professor of Computer Science, will lead the Center and will serve as program director. Other Principal Investigators on the project are Saurabh Sinha, associate professor of Computer Science and graduate program faculty member in Bioengineering; Jun Song, Founder Professor in Bioengineering and in Physics; and Richard Weinshilboum, M.D., interim director of the Mayo Clinic Center for Individualized Medicine and director of its Pharmacogenomics Translational Program. C. Victor Jongeneel, graduate program faculty member in Bioengineering, IGB and NCSA Director of Bioinformatics, and Director of the High-Performance Biological Computing Group, will function as Executive Director.
The Center’s transcendence of disciplinary boundaries will be key to its success. Insights drawn from many areas of computer science will strengthen KnowEnG’s design.
“By integrating multiple analytical methods derived from the most advanced data mining and machine learning research, KnowEnG will transform the way biomedical researchers analyze their genome-wide data,” said Han. “The Center will leverage the latest computational techniques used to mine corporate or Internet data to enable the intuitive analysis and exploration of biomedical Big Data.”
The Center also will rely on communication between interface design experts at Illinois and biomedical researchers at Mayo Clinic, who represent KnowEnG’s intended users. Feedback among these Center members will ensure that the developed tool is valuable, intuitive and customizable for use in a broad array of experimental contexts.
Describing his excitement for the project, Co-PI Sinha explained, “This is [a project] that's bigger than all of us … What I'm most excited about is the actual possibility that this could be a tool which everybody uses in the world.”
In addition to development of KnowEnG, the Center will develop a training framework that empowers researchers to use the new tool and engage in bioinformatics research, regardless of their prior computational knowledge. The Center also will participate in a planned nationwide consortium — composed of all the BD2K Centers of Excellence established by the NIH initiative — to exchange insights, contribute to standards for tool development, and help set broad goals for the future of work on Big Data.