Stegography is an ancient technique of hiding data within data. Unlike encryption, it isn't obviously encrypted. Today it is used to take advantage of unused bits of data in images or audio/video files to transmit secrets.

The basic concept of understanding the hidden data in files can also be used in understanding computer networks and biology, says Weixiong Zhang, Ph.D., Washington University associate professor of computer science. He and his co-authors writing in Physical Review E say they have created an algorithm to automatically discover communities and their subtle structures in various networks, including biological ones. They used it to identify the community structure of a network of co-expressed genes involved in bacterial sepsis.

Many complex systems can be represented as networks, Zhang said, including the genetic networks he studies, social networks and the Internet. The community structure of networks features a natural division of the network where the vertices in each subnetwork are highly involved with each other, though connected less strongly with the rest of the network.

Communities are relatively independent of one another structurally, but it is thought that each community may correspond to a fundamental functional unit. A community in a genetic network usually contains genes with similar functions, just as a community on the World Wide Web often corresponds to web pages on similar topics.

All Zhang and Ruan need are data. Their algorithm is more scalable than existing algorithms and can detect communities at a finer scale and with a higher accuracy than similar algorithms. The impact of having such a computational biology tool is in genomics, where researchers may be better able to identify and understand communities of genes and their networks as well as how they cooperate in causing diseases, such as sepsis, virus infections, cancer and Alzheimer’s disease.

In biological systems there are lots of communities with many proteins involved to form complexes.

“We can use this tool to identify structures embedded in the data.,” Zhang said. “We’ve identified the substructures of three different RNA polymerase complexes from noisy data, for instance, which are crucial for gene transcription.”

Zhang began his computer science career as a specialist in artificial intelligence, but in recent years he has expanded to bring his skills to computational biology. His main interest and ambition are to use computational means to solve some basic biology problems and problems related to human diseases. For example, his group studied a basic problem of the transcription mechanism of microRNAs, which are small noncoding regulatory RNAs that regulate the development and stress responses of nearly all eukaryotic species that have been studied. Using machine learning techniques, Zhang and his collaborators showed that almost all intergenic microRNA genes in four model species, human, mouse, rice and Arabidopsis, are transcribed by RNA polymerase II, which transcribes protein-coding genes.

Multidisciplinary research that combines computational approaches with biological data is a hallmark of research themes in Zhang's group. As another example, Zhang and his Ph.D. student, Guandong Wang, developed an algorithm, called WordSpy, for identifying cis-regulatory elements – short DNA sequences that are critical for the regulation of gene expression ¬– from a large amount of genome sequences.

Not only has he studied networks, Zhang also formed a broad network of collaborations with scientists across the Washington University campus and outside of the university. The problems he has been interested in are diverse, ranging from stress responses and virus infection in plants, such as rice, to human diseases, including Alzheimer's disease, herpesvirus infection, sepsis, cardiac hypertrophy, lung cancer and lung transplantation. The computational tools his group has developed are helping him and his collaborators come to grips with how perturbation to gene expression can lead to complex traits and human diseases as well as how microRNAs regulate gene expression.

Zhang was recently awarded a grant from the Alzheimer's Association to develop computational systems biology methods for analyzing gene expression perturbation in diseased brains. He has been collaborating with scientists in the Washington University School of Medicine and Scripps Institute in La Jolla, California, to study roughly 30 postmortem brain samples of people who died from Alzheimer’s disease.

“I’m interested in modeling gene expression perturbation in diseased brains, and am looking for the genetic signature, “ Zhang said. “Due to the complexity of Alzheimer's disease, we are developing other tools and will have to use all the tools we have and can get. It’s a polygenic disease, with a lot of genes at work. I’m sure we’ll find that a network is involved.”

Article: Zhenping Li, Shihua Zhang, Rui-Sheng Wang, Xiang-Sun Zhang and Luonan Chen, 'Quantitative function for community detection,' Phys. Rev. E 77, 036109 (2008), doi:10.1103/PhysRevE.77.036109