
ChimerDB 4.0 – A Database of Fusion Genes in Cancer

By Prof. Sanghyuk Lee (sanghyuk@ewha.ac.kr)
Department of Life Sciences

Fusion genes represent an important class of biomarkers and therapeutic targets in cancer. Since the groundbreaking discovery of BCR–ABL1 fusion gene in leukemia (the target gene of Gleevec), numerous driver fusion alterations have been identified as druggable targets, including genes such as TMPRSS2, ALK, RET, FGFR3, ROS1 and ESR1, leading to development of novel targeted therapies. Reliable resources of fusion gene candidates are of great importance to identify biomarkers and therapeutic targets in cancer.

ChimerDB is a comprehensive database of fusion genes encompassing analysis of deep sequencing data (ChimerSeq) and text mining of publications (ChimerPub) with extensive manual annotations (ChimerKB). High throughput RNA-Seq data of cancer patients have served as a fundamental resource for fusion gene identification. Analyzing huge volume of deep sequencing data is a big challenge even for modern computer power. ChimerSeq covers all 10,565 patients in the the Cancer Genome Atlas (TCGA) project, yielding 65,945 fusion candidates, 21,106 of which were classificed as reliable. This is the largest collection of fusion gene candidates of high fideldity. Mining PubMed abstracts is tremendously difficult because of various issues in specifying gene fusions and gene names. We applied a state-of-the-art deep learning method for text mining of ~30 million PubMed abstracts and filtered out false positives manually. This yielded 1,257 reliable fusion genes deposited into ChimerPub. ChimerKB is our own effort to provide the gold standard of fusion genes with publication support, experimental evidences, and breakpoint information. ChimerKB includes 1,597 cases, the largest non-redundant set available in public. The website is available at http://www.kobic.re.kr/chimerdb.




Of note, ChimerDB 1.0 was published in the Database issue of Nucleic Acids Research in 2006. Its update papers were published three times in NAR (2010, 2017, 2020) since then. Thus, ChimerDB has provided elaborate and up-to-date information continuously for the last 15 years, and is regarded as one of representative information resources on fusion genes from the cancer research community. The database papers have been cited 191 times so far.

* Related Article
Y.E. Jang, I. Jang, S. Kim, S. Cho, D. Kim, K. Kim, J. Kim, J. Hwang, S. Kim, J. Kim, J. Kang, B. Lee*, S. Lee*, ChimerDB 4.0: an updated and expanded database of fusion genes, Nucleic Acids Research, 48, D817-D824 (2020).