A gene family is a group of genes that share important characteristics. In many cases, genes in a family share a similar sequence of DNA building blocks (nucleotides). These genes provide instructions for making products (such as proteins) that have a similar structure or function. In other cases, dissimilar genes are grouped together in a family because proteins produced from these genes work together as a unit or participate in the same process.
Classifying individual genes into families helps researchers describe how genes are related to each other. Researchers can use gene families to predict the function of newly identified genes based on their similarity to known genes. Similarities among genes in a family can also be used to predict where and when a specific gene is active (expressed). Additionally, gene families may provide clues for identifying genes that are involved in particular diseases.
This part collected the gene families based on alignment to Arabidopsis genome. The Arabidopsis classification collection criteron was considered. Most genes in the same gene family have the similar protein structure and same functional domains. For these genes, we used MUSCLE (Edgar et al. 2004.) to generate a multiple sequence alignment for each Arabidopsis gene family. The multiple sequence alignment was then input into SAM 3.5 to build a hidden Markov model (HMM) for each family. Every gene sequence was aligned with each of these HMMs, and output an e-value. The lower the e-value is, the better fitness between the gene family sequence and a hidden Markov model. Thus, the gene was assigned into the family whose HMM produces the lowest e-value.