The HDOCK server is to predict the binding complexes between two molecules like proteins and nucleic acids by using a hybrid docking strategy. Therefore, users need to provide input for the two molecule to be docked. The HDOCK server can accept four types of input for molecules:
- Upload your pdb file in PDB format.
- Provide your pdb file in PDB ID:ChainID (e.g. 1CGI:E).
- Copy and paste your protein sequence in FASTA format.
- Upload your protein sequence file in FASTA format
Only ONE type of input is needed for each molecule.
If more than one types of input are provided, the first one will be used. For the "PDB ID:ChainID" input, the user can provide one single chain ID or multiple chain IDs. For example, "1CGI:E" stands for the chain E of the pdb file of 1CGI; "1AHW:AB" stands for the chains A and B of the pdb file of 1AHW. If only a sequence is provided, the server will automatically constuct a model structure from a homologous template in the Protein Data Bank using a in-house modeling pipeline of HH Suite , Clustalw2, and MODELLER. In addition, users are also recommended to submit their own pdb file if the protein contains multiple chains, as our pipeline is currently designed to model single-chain proteins.
NOTE: For docking efficiency, it is recommended that the larger one of two molecules is input as receptor if one molecule is much larger than the other one.
"Select a type" is not needed for structure input, as the HDOCK server is able to determine a molecular type according to the input structure. However, for sequence input, users are strongly recommended to select a molecular type; otherwise, the server will guess one from `Protein', `ssRNA', or `dsDNA' based on the input sequence.
Here are the definitions of different molecular types:where the maximum input sequence is 500 for double-stranded (ds) RNA/DNA molecules.Type Description Protein Standard protein molecule ssRNA General single-chain RNA molecule ssDNA General single-chain DNA molecule dsDNA Double-stranded B-DNA duplex molecule dsRNA Double-stranded A-RNA duplex molecule
HDOCK server now accepts sequence inputs for single-stranded (ss) or double-stranded (ds) RNA/DNA. Only the sequence of a single strand is needed, which can contain the sequence only like this>example GGAGCGGUAGUUCAGUCGGUUAGAAUACCUGCCUGUCACGCAGGGGGUCGCGGGUUCGAGUCCCGUCCGUUCCGCCAor both the sequence and its secondary structure for single-stranded (ss) RNA/DNA like this>example GGAGCGGUAGUUCAGUCGGUUAGAAUACCUGCCUGUCACGCAGGGGGUCGCGGGUUCGAGUCCCGUCCGUUCCGCCA (((((((..((((.........))))((((.(((((...))))))))).(((((.......))))))))))))....HDOCK will then build its 3D structure based on the single sequence, or model a double-stranded 3D duplex structure by construting a complementary Watson-Crick paired second strand.
The HDOCK performs global docking to predict the binding complexes between two molecules. Therefore, no information about the binding site is necessary for the docking job. However, the server also gives users the option to specify the binding site residues if such information is available, such that the predicted models will have a higher accuracy. Two types of binding site information can be provided.
195:A, 203-206:A, 108:Bwhich stand for residues 195, 203-206 of chain A, and 108 of chain B. Note that the residues in a line must be separated by comma.
The binding site residues may also be submitted as a file that will look like this
195:A 203-206:A 108:B
195:A 236:B 8, 215-218:A 306:B 6where the distance of residue 195 of chain A on the receptor and residue 236 of chain B on the ligand will be within 8 A; The distance of residues 215-218 of chain A on the receptor and residue 306 of chain B on the ligand will be within 6 A. Likewise, the above distance restraints can also be provided as a file that looks like this
195:A 236:B 8 215-218:A 306:B 6
NOTE For each restraint, the first field is for receptor, the second field is for ligand, and the third field is for the constrained distance. The residue representation must be in num:chainID or num1-num2:chainID format, where the residue number and chain ID refer to the input structure if the input is a structure, or the modeled structure if the input is a sequence.
CAUTION For the 3D structure modeled by the server, the chain ID is set to “A” for single-chain molecule. The numbering of residues is consistent with that in the input sequence.
The small-angle X-ray scattering (SAXS) experimental data can be provided as a post-docking filter for ranking the binding modes predicted by the HDOCK docking. The SAXS data file contains three columns, q, I(q), and error, like this0.0000E+00 1.4612E+07 3.0685E+03 1.0000E-03 1.4743E+07 4.8653E+03 2.0000E-03 1.4827E+07 7.3394E+03 3.0000E-03 1.4685E+07 1.0573E+04 4.0000E-03 1.4674E+07 1.3206E+04 5.0000E-03 1.4659E+07 1.5831E+04 6.0000E-03 1.4729E+07 1.5466E+04 7.0000E-03 1.4707E+07 1.7649E+04 8.0000E-03 1.4594E+07 2.3642E+04 9.0000E-03 1.4787E+07 2.8835E+04With the SAXS experimental curve, the binding models will be ranked according to a weighted score of the docking energy score calculated by our scoring function and the CHI value that measure the goodness of the predicted binding modes fitting to the SAXS experimental data.
This step is for advanced users if they want to obtain more than 100 predicted complex models or filter the docked complex models with their own experimental information. The downloaded package contains an HDOCK output file, named like hdock_5c984053e4b83.out, that includes all 4392 docking solutions like thisGrid spacing: 1.200 Angle step: 15.000 Initial rotation: 0.00000 0.00000 0.00000 1CGI_r_b.pdb 23.562 26.523 22.675 1CGI_l_b.pdb 47.776 34.961 33.826 1.27246 0.01055 5.02167 -0.328 -0.164 0.264 -445.20 0.45 1.00 2.80075 0.00162 3.49381 -0.286 -0.209 0.111 -444.37 0.38 1.00 0.02137 0.00051 -0.00948 -0.267 -0.212 0.104 -444.28 0.36 1.00 2.98094 0.00164 3.31735 -0.237 -0.259 0.116 -444.15 0.37 1.00 3.04247 0.00300 3.25767 -0.340 -0.315 0.134 -442.80 0.49 1.00 ...where the first 5 lines have the following definitionsThe 1st line is the Grid spacing of three (x, y, z) translational degrees of freedom. The 2nd line is the Euler angle step for three rotational degrees of freedom. The 3rd line are the initial rotation of the ligand before docking (optional). The 4th line stands for the receptor file and its center of geometry. The 5th line is the ligand file and its center of geometry.Starting from the 6th line are the predicted binding modes each of which is represented by three translations, three rotations, its binding score, RMSD from the initial ligand orientation, and the translational ID for the rotation.
Users can download our "createpl_linux" program and run it locally to generate complex models like thiscreatepl_linux hdock_5c984053e4b83.out top100.pdb -nmax 100 -complex -modelswhere binding site residues or restraints can be applied to filter the complex models. Users can typecreatepl_linuxfor the detailed usage about the program.
After generating the complex models, users may also use a third-party program like FoXS to calculate the SAXS CHI values of the models based on their small-angle X-ray scattering (SAXS) profile file.
Confidence_score = 1.0/[1.0+e0.02*(Docking_Score+150)]Roughly, when the confidence score is above 0.7, the two molecules would be very likely to bind; when the confidence score is between 0.5 and 0.7, the two molecules would be possible to bind; when the confidence score is below 0.5, the two molecules would be unlikely to bind. Nevertheless, the confidence score here should be used carefully due to its empirical nature.