본문 바로가기

AI 독성예측

[AI독성예측] TOX21 Data Challenge Review

tripod.nih.gov/tox21/challenge/

 

https://tripod.nih.gov/tox21/challenge/

NCATS will provide assay activity data and chemical structures on the Tox21 collection of ~10,000 compounds (Tox21 10K). A collection of compounds independent of the Tox21 10K collection will be used as the test set. Get the Data »

tripod.nih.gov

colab.research.google.com/drive/1bYK6DPjS69QOIOLfoEMDQK_pRljv0Vji?usp=sharing

 

Google Colaboratory

 

colab.research.google.com

.분자량, 용해도 또는 표면적과 같은 화학적 설명자를 나타내는 801 개의 "Dense feature"과 화학적 하위 구조를 나타내는 272,776 개의 "Sparse feature"가 있습니다 (ECFP10, DFS6, DFS8, Matrix Market Format)

 

이 feature들은 어떻게 얻어낸 것인가?

이 두 논문을 확인해 보자. 

[Mayr2016] Mayr, A., Klambauer, G., Unterthiner, T., & Hochreiter, S. (2016). DeepTox: Toxicity Prediction using Deep Learning. Frontiers in Environmental Science, 3:80.
[Huang2016] Huang, R., Xia, M., Nguyen, D. T., Zhao, T., Sakamuru, S., Zhao, J., Shahane, S., Rossoshek, A., & Simeonov, A. (2016). Tox21Challenge to build predictive models of nuclear receptor and stress response pathways as mediated by exposure to environmental chemicals and drugs. Frontiers in Environmental Science, 3:85.

mayr2016.pdf
4.21MB
huang2016.pdf
3.81MB

 

어떻게 얻었는지에 대한 방법이 안나와있는것 같다. 

이것인가?

static features --> off-the-shelf software (Cao et al., 2013)
weight, Van der Waals volume, and partial charge information
dynamic features --> 
The DeepTox pipeline uses JCompoundMapper (Hinselmann et al., 2011) to create dynamic features.

2013년, 2011년도 방법인데 ... 너무 구식 아닌가? 좀더 최근에 나온 방법은 없는가? --> 다른 논문을 찾아보도록하자. 

Cao, D.-S., Xu, Q.-S., Hu, Q.-N., and Liang, Y.-Z. (2013). ChemoPy: freely available python package for computational biology and chemoinformatics. Bioinformatics 29, 1092–1094. doi: 10.1093/bioinformatics/btt105
Hinselmann, G., Rosenbaum, L., Jahn, A., Fechner, N., and Zell, A. (2011). jCompoundMapper: an open source Java library and command-line tool for chemical fingerprints. J. Cheminform. 3:3. doi: 10.1186/1758-2946-3-3

cao2013.pdf
0.10MB

 

hinselmann2011.pdf
0.81MB

 

The Tox21 dataset in particular comprised several thousands of static features and hundreds of millions of dynamic features that were sparsely coded.

Supplementary section은 어디에 있는 것인가?

static features --> ChemoPy: freely available python package for computational biology and chemoinformatics 

kimchangheon.tistory.com/20

 

[AI독성예측] ChemoPy : freely available python package for computationalbiology and chemoinformatics

ChemoPy : 구조적 및 물리 화학적 feature를 계산하기 위한 open soruce 파이썬 패키지 16개의 drug feature 그룹안에 19개의 descriptor가 있고, 이는 1135개의 descriptor value를 포함함. --> 1135개의 descri..

kimchangheon.tistory.com

dynamic features --> jCompoundMapper: An open source Java library and command-line tool for chemical fingerprints

kimchangheon.tistory.com/21

 

[AI독성예측] jCompoundMapper: An open source Java libraryand command-line tool for chemical fingerprints

Decompostion of a chemcial graph : 특정 유기 화합물의 정보를 인코딩하는 편리한 방식. --> 이를 정확하게 하기 위한 libaray도입 Popular fingerfrint 알고리즘을 구현 DFS fingerfrint extended connectivi..

kimchangheon.tistory.com

ChemoPy로 static features를 얻어내고 

JCompoundMapper로 dynamic featrues를 얻어낸다.

 

우선 ChemoPy --> python2로 구현되어있음.

JCompundMappner -->

 

다른 논문에서는 어떻게?

github.com/filipsPL/tox21_dataset

 

filipsPL/tox21_dataset

Datasets used in the tox21 challenge. Contribute to filipsPL/tox21_dataset development by creating an account on GitHub.

github.com

www.frontiersin.org/articles/10.3389/fenvs.2015.00077/full

 

Prediction of Compounds Activity in Nuclear Receptor Signaling and Stress Pathway Assays Using Machine Learning Algorithms and L

Toxicity evaluation of newly synthesized or used compounds is one of the main challenges during product development in many areas of industry. For example, toxicity is the second reason—after lack of efficacy—for failure in preclinical and clinical stu

www.frontiersin.org

Descriptors Generation

For standardized data sets, two-dimensional molecular descriptors were calculated using KNIME nodes: RDKit (http://rdkit.org/, 117 descriptors), CDK (Beisken et al., 2013; http://sourceforge.net/projects/cdk/, 97 descriptors) and fingerprints [PubChem (881 bits) and MACCS (167 bits)], giving 1262 descriptors for each compound. For the list of used descriptors and literature references see Supplementary Table S5. For each target, Arff weka file was created using KNIME Arff Writer node.