Multi-task deep learning for concurrent prediction of protein structural properties

Buzhong Zhang, Jinyan Li, Lijun Quan, Qiang Lyu*

Abstract

Protein structural properties are diverse and have the characteristics of spatial hierarchy, such as secondary structures, solvent accessibility and backbone angles. Concurrent prediction of these tightly related structural features is more useful to understand the overall protein structure and functions. We proposed a multi-task deep learning method for concurrent prediction of protein secondary structures, solvent accessibility and backbone angles (phi,psi). The Supplementary online materials of the new method ( named CRRNN2 ) are provided here.

Supplementary online materials

Softwares:

  1. The predictng software of CRRNN2 (standalone version) can be download here . Please note that to run the predictor, you need to install the following softwares other from ours:

1.1 Linux

1.2 python 3

1.3 Keras 2.1.4 and tensorflow 1.13

1.4 blast 2.2.28 for preparing the PSSM feature set

1.5 HHsuite 3.0 for preparing the HHM feature set

Please follow the README in our software package in order to prepare input features and run our predictor. Script files in the software package is provided for demo how to run our model.

Data files:

  1. Sequences and labels of training dataset are also provided.
  2. Our experiments used test datasets TS1199, CB513, CASP10, CASP11,CASP12 and mapping vectors are provided here.
    Mapping vectors will be used to prepare your testing dataset.
    The preprocessed CB513 dataset wich is transformatted from Jian Zhou's dataset.
    The CASP data style is: sequences residues features,labels. The 21-dim features are 20 PSSM and residues.
    The style of PSSM is: A R N D C Q E G H I L K M F P S T W Y V
    input data of CRRNN2 are "sequences residues features". The input features are: 22dim-protein encoding, 20-PSSM and 30-dim HHM profiles.

Thank you!

If you have any suggestions or questions, Please email to: bzzhang@stu.suda.edu.cn