|AppleGFDB:The Apple Gene Function & Gene Family DataBase v1.0|
Apple Gene Function and Gene Family Database (AppleGFDB) is supported by National Research Center for Apple Engineering and Technology, and State Key Laboratory of Crop Biology. The AppleGFDB aims to collect any information that is helpful for apple genome annotation.
AppleGFDB provides genome sequence from apple ('Golden Delicious', Malus 〜 domestica Borkh., family Rosaceae, tribe Pyreae) and annotation of the 17 apple chromosomes. The genome sequence and the peptide sequence were download from the GDR database and the FEM-IASMA Computational Biology Web Resources. These data are available through search pages and our Genome Browser that provides an integrated display of annotation data.
AppleGFDB includs 63,541 gene models and 301,186 exons. The apple genes that have been researched afforded the prior attention to be annotated. For these abundant apple genes which have been not studied yet, the annotation of the most similar Arabidopsis gene in TAIR, and populus gene in PlantGDB are used as the reference for that of apple. Currently, the genome annotation consists the following parts:
1. GO analysis
Allow for the lack of studies about the function of apple genes, it is necessary to consult from known genes based on sequence similarities. Thus, we should draw support from the tool which can provide detailed annotation about these genes in model plant. On the basis of that Gene Ontology can provide structured, functional annotation and classifications of several plants, we select the tool to predict the unknown function of the apple genes. Each apple gene was found out its best homologous gene in Arabidopsis, and we then regard the GO terms of this Arabidopsis gene as the annotation of apple gene. Besides, the apple GO analysis was performed by the GO tools and the Interproscan, too.
2. Conserved Domain analysis
The protein domain structures with functional motifs are greatly helpful for the users to grasp the functions of protein rapidly. As CDD collected many important protein databases including Pfam, SMART, COG, TIGRFAM, and the NCBI Protein Clusters, the information of protein domains would be comprehensive. Therefore, we selected the NCBI CDD (Marchler-Bauer et al. 2011) and its batch-CD search tool to analyze domains of each protein encoding by apple gene, and then examined the validity of output data by Pfam and SMART. In the end, the analyzed data were organized to generate apple protein domain database. By inputting the given domain name into the interface of protein domain, user can get access to all apple genes with this domain. To be convenient for the user, the protein structure with domains is visualized as the concise and illustrative map showing in each information page of apple gene. Furthermore, apple CDD can be further used to examine my gene family classification. We checked whether the genes classified into each family have the typical domain of this gene family. Consequently, an apple gene has the domains which are typical of the family it is classified in. The results also demonstrated that our gene family classification is reasonable. AppleGADB 2.0 collected all predictied conserved domains using NCBI Batch CD-search tool and the PFAM search tool.
3. Gene Family classification
This part collected the gene families based on alignment to Arabidopsis genome. The Arabidopsis classification collection criteron was considered. Most genes in the same gene family have the similar protein structure and same functional domains. For these genes, we used MUSCLE (Edgar et al. 2004.) to generate a multiple sequence alignment for each Arabidopsis gene family. The multiple sequence alignment was then input into SAM 3.5 to build a hidden Markov model (HMM) for each family. Every gene sequence was aligned with each of these HMMs, and output an e-value. The lower the e-value is, the better fitness between the gene family sequence and a hidden Markov model. Thus, the gene was assigned into the family whose HMM produces the lowest e-value.
The information of the well studied plants such as Arabidopsis provides us a chance to investigate the apple proteins by many databases. Of them, interpro is useful database of protein families, domains and functional sites in which identifiable features found in known protein can be applied to novel protein sequence in order to functionally characterize them. For apple proteins which most were unknown in function, interpro would be useful to predict the role of apple proteins. The interpro classification system also provides us another criterion to analyze gene function and annotation. The interpro data was obtained by the Interproscan tool in EBI.
5.Gene evolution analysis
The evolution analysis were obtained by aligning each apple gene with other plants genome, and the gene and protein structure of the best orthologous gene in plants were showed by the form of figures.
The predicted microRNAs were mainly collected from the availabe publications and microRNA database. And the conserved miRNAs was conducted by miRPI software from the apple genome by Dr. Guodong Yang and Dr. Zhenlin Wei.
7. Blast Sequence Search
This component provides users a chance to research a query nucleotide or protein sequence against all sequences stored in AppleGADB. User can submit a query sequence and change the BLAST paramters. After performing the BLAST search, the significantly hit genes are ranked basing on the e-value generated from every gene. In this component, the users also can provide query text (FASTA format) to search information in the database. After submitting query text, hit genes would appear which ranked by sequence alignment.
Zhang, S., Chen, G. H., Liu, Y., Chen, H., Yang, G., Yuan, X., ... & Shu, H. (2013). Apple gene function and gene family database: an integrated bioinformatics database for apple research.?Plant Growth Regulation, 1-8.[FullText].
For any problems and advices, please contact: