HDOCK help

Help for using HDOCK server

1. How to provide input for docked molecules

The HDOCK server is to predict the binding complexes between two molecules like proteins and nucleic acids by using a hybrid docking strategy. Therefore, users need to provide input for the two molecule to be docked. The HDOCK server can accept four types of input for molecules:

Upload your pdb file in PDB format.
Provide your pdb file in PDB ID:ChainID (e.g. 1CGI:E).
Copy and paste your protein sequence in FASTA format.
Upload your protein sequence file in FASTA format

Only ONE type of input is needed for each molecule.
If more than one types of input are provided, the first one will be used. For the "PDB ID:ChainID" input, the user can provide one single chain ID or multiple chain IDs. For example, "1CGI:E" stands for the chain E of the pdb file of 1CGI; "1AHW:AB" stands for the chains A and B of the pdb file of 1AHW. If only a sequence is provided, the server will automatically constuct a model structure from a homologous template in the Protein Data Bank using a in-house modeling pipeline of HH Suite , Clustalw2, and MODELLER. In addition, users are also recommended to submit their own pdb file if the protein contains multiple chains, as our pipeline is currently designed to model single-chain proteins.
NOTE: For docking efficiency, it is recommended that the larger one of two molecules is input as receptor if one molecule is much larger than the other one.
Molecular Type:
"Select a type" is not needed for structure input, as the HDOCK server is able to determine a molecular type according to the input structure. However, for sequence input, users are strongly recommended to select a molecular type; otherwise, the server will guess one from `Protein', `ssRNA', or `dsDNA' based on the input sequence.
Here are the definitions of different molecular types:
Type Description Protein Standard protein molecule ssRNA General single-chain RNA molecule ssDNA General single-chain DNA molecule dsDNA Double-stranded B-DNA duplex molecule dsRNA Double-stranded A-RNA duplex molecule
where the maximum input sequence is 500 for double-stranded (ds) RNA/DNA molecules.

2. RNA/DNA 3D structure modeling

HDOCK server now accepts sequence inputs for single-stranded (ss) or double-stranded (ds) RNA/DNA. Only the sequence of a single strand is needed, which can contain the sequence only like this
>example
GGAGCGGUAGUUCAGUCGGUUAGAAUACCUGCCUGUCACGCAGGGGGUCGCGGGUUCGAGUCCCGUCCGUUCCGCCA
or both the sequence and its secondary structure for single-stranded (ss) RNA/DNA like this
>example
GGAGCGGUAGUUCAGUCGGUUAGAAUACCUGCCUGUCACGCAGGGGGUCGCGGGUUCGAGUCCCGUCCGUUCCGCCA
(((((((..((((.........))))((((.(((((...))))))))).(((((.......))))))))))))....
HDOCK will then build its 3D structure based on the single sequence, or model a double-stranded 3D duplex structure by construting a complementary Watson-Crick paired second strand.

3. How to specify the binding site [optional]

The HDOCK performs global docking to predict the binding complexes between two molecules. Therefore, no information about the binding site is necessary for the docking job. However, the server also gives users the option to specify the binding site residues if such information is available, such that the predicted models will have a higher accuracy. Two types of binding site information can be provided.

Binding site resdiues on the receptor or ligand.

	195:A, 203-206:A, 108:B

The binding site residues may also be submitted as a file that will look like this

	195:A
	203-206:A
	108:B

The residues are put on different lines in the file.

Distance restraints between interacting residues

	195:A 236:B 8, 215-218:A 306:B 6

	195:A 236:B 8 
	215-218:A 306:B 6

NOTE For each restraint, the first field is for receptor, the second field is for ligand, and the third field is for the constrained distance. The residue representation must be in num:chainID or num1-num2:chainID format, where the residue number and chain ID refer to the input structure if the input is a structure, or the modeled structure if the input is a sequence.

CAUTION For the 3D structure modeled by the server, the chain ID is set to “A” for single-chain molecule. The numbering of residues is consistent with that in the input sequence.

4. SAXS experimental data curve

The small-angle X-ray scattering (SAXS) experimental data can be provided as a post-docking filter for ranking the binding modes predicted by the HDOCK docking. The SAXS data file contains three columns, q, I(q), and error, like this
        0.0000E+00  1.4612E+07  3.0685E+03
        1.0000E-03  1.4743E+07  4.8653E+03
        2.0000E-03  1.4827E+07  7.3394E+03
        3.0000E-03  1.4685E+07  1.0573E+04
        4.0000E-03  1.4674E+07  1.3206E+04
        5.0000E-03  1.4659E+07  1.5831E+04
        6.0000E-03  1.4729E+07  1.5466E+04
        7.0000E-03  1.4707E+07  1.7649E+04
        8.0000E-03  1.4594E+07  2.3642E+04
        9.0000E-03  1.4787E+07  2.8835E+04
With the SAXS experimental curve, the binding models will be ranked according to a weighted score of the docking energy score calculated by our scoring function and the CHI value that measure the goodness of the predicted binding modes fitting to the SAXS experimental data.

5. Post-docking process (optional)

This step is for advanced users if they want to obtain more than 100 predicted complex models or filter the docked complex models with their own experimental information. The downloaded package contains an HDOCK output file, named like hdock_5c984053e4b83.out, that includes all 4392 docking solutions like this
Grid spacing:     1.200
Angle step:    15.000
Initial rotation:     0.00000   0.00000   0.00000
1CGI_r_b.pdb      23.562    26.523    22.675
1CGI_l_b.pdb      47.776    34.961    33.826
   1.27246   0.01055   5.02167    -0.328    -0.164     0.264   -445.20      0.45      1.00
   2.80075   0.00162   3.49381    -0.286    -0.209     0.111   -444.37      0.38      1.00
   0.02137   0.00051  -0.00948    -0.267    -0.212     0.104   -444.28      0.36      1.00
   2.98094   0.00164   3.31735    -0.237    -0.259     0.116   -444.15      0.37      1.00
   3.04247   0.00300   3.25767    -0.340    -0.315     0.134   -442.80      0.49      1.00
   ...
where the first 5 lines have the following definitions
   The 1st line is the Grid spacing of three (x, y, z) translational degrees of freedom.
   The 2nd line is the Euler angle step for three rotational degrees of freedom.
   The 3rd line are the initial rotation of the ligand before docking (optional).
   The 4th line stands for the receptor file and its center of geometry.
   The 5th line is the ligand file and its center of geometry.
Starting from the 6th line are the predicted binding modes each of which is represented by three translations, three rotations, its binding score, RMSD from the initial ligand orientation, and the translational ID for the rotation.
Users can download our "createpl_linux" program and run it locally to generate complex models like this
	createpl_linux hdock_5c984053e4b83.out top100.pdb -nmax 100 -complex -models
where binding site residues or restraints can be applied to filter the complex models. Users can type
	createpl_linux
for the detailed usage about the program.
After generating the complex models, users may also use a third-party program like FoXS to calculate the SAXS CHI values of the models based on their small-angle X-ray scattering (SAXS) profile file.

6. Explanations of evaluation metrics

Docking Score: The docking scores are calculated by our knowledge-based iterative scoring function ITScorePP or ITScorePR. A more negative docking score means a more possible binding model, but the score should not be treated as the true binding affinity of two molecules because it has not been calibrated to the experimental data.

Confidence Score: Given that the protein-protein/RNA/DNA complexes in the PDB normally have a docking score of around -200 or better, we have empirically defined a docking score-dependent confidence score to indicate the binding likeliness of two molecules as follows,
```
		Confidence_score = 1.0/[1.0+e^{0.02*(Docking_Score+150)}]
```
Roughly, when the confidence score is above 0.7, the two molecules would be very likely to bind; when the confidence score is between 0.5 and 0.7, the two molecules would be possible to bind; when the confidence score is below 0.5, the two molecules would be unlikely to bind. Nevertheless, the confidence score here should be used carefully due to its empirical nature.

Ligand rmsd: The ligand RMSDs are calculated by comparing the ligands in the docking models with the input or modeled structures. Therefore, the ligand RMSD is not necessarily a metric of the accuracy for the corresponding model.

Interface redidues: The interface information for each model includes all the residue pairs within 5.0 A between the receptor and the ligand for the corresponding model. Users can click to check/dowload the files for different models.

SAXS CHI Square: The CHI values of the predicted models compared to the SAXS data curve, which is calculated using the FoXS program. A smaller CHI square means a better consistence between the model and the SAXS data.