Prediction of 8-state protein secondary structures by a novel deep learning architecture
Buzhong Zhang, Jinyan Li and Qiang Lü*
Protein secondary structure can be regarded as a bridge that links the
primary sequence and tertiary structure. An accurate secondary structure
prediction can significantly give more precise and high resolution on
structure-based properties analysis. A Faster and more accurate protein
secondary structure prediction tool CRRNN(eCRRNN) is provided here.
Supplementary online materials
- The predictng models of eCRRNN (standalone version) can be download here
Please note that
to run the predictor, you need to install the following softwares other
1.2 python 2.7
1.3 Keras 2.1.4 and tensorflow 1.13
1.4 blast 2.2.28 for preparing the PSSM feature set
Please follow the README in our software package in order to prepare
input features and run our predictor. Script files is provided for demo how to run our model.
- Sequences and labels of TR6614 are also provided.
Our training sets are generated from cullpdb_pc25_res3.0_R1.0_d160826_chains12665.fasta
and labels of TR5534 are also provided. Anyone who used this dataset,
please thanks to Jian Zhou and Olga G. Troyanskaya. If possible, please
Zhou, Jian, and O. G. Troyanskaya.
"Deep Supervised and Convolutional Generative Stochastic Network for
Protein Secondary Structure Prediction." Proceedings of the 31st
International Converenfe on Machine Learning (ICML), (2014):745-753.
- Our experiments used test datasets CASP10, CASP11,CASP12 and mapping vectors are provided here.
Mapping vectors will be used to prepare your testing dataset.
is provided in fasta and label format.
The preprocessed CB513 dataset wich is transformatted from Jian Zhou's dataset can be down
And the example coding is also provided here.
The CASP data style is: sequences residues features,labels. The 21-dim
features are 20 PSSM and residues.
The style of PSSM is: A R N D C Q E G H I L K M F P S T W Y V
input data of eCRRNN are "sequences residues features". The input features
are: 20-PSSM, 7-dim Physical properties, 1-dim conservation score, 22dim-
The person who uses this data and code is expected to cite the following paper:
Buzhong Zhang, Jinyan Li and Qiang Lü. Prediction of 8-state protein secondary structures by a novel deep learning architecture. BMC Bioinformatics,(2018)19:293.
If you have any suggestions or questions, Please email to: