Representative Sets of DNA 3D Structures
17120 RNA-containing 3D structures
Release id | All changes | Date | Number of IFEs |
---|---|---|---|
0.2 (current) | 3 changes | 2011-02-12 | 4362 |
0.1 | 599 changes | 2011-02-05 | 4361 |
The Representative Sets of DNA 3D Structures organize all DNA-containing 3D structures from PDB into sequence/structure equivalence classes and selects a high-quality representative structure from each class. The resulting Representative Sets of DNA 3D structures are appropriate for tasks which require searching or training over the breadth of the entire DNA 3D structure database, but which should avoid the redundancy inherent in PDB due to multiple 3D structures of the same molecule from the same organism. Equivalence classes show all structures of the same molecule, and the associated heat maps show all-against-all geometric comparisons of the structures within each class.
Representative sets of DNA 3D structures are in development. We are starting with the same date as we used for representative sets of RNA 3D structures, and we will fill in DNA releases as they would have been. We will need to modify the methodology somewhat compared to RNA.
Releases are generated weekly, and previous releases are available starting from 2011. The default listing shows structures at 4 Angstrom resolution or better, but different resolution thresholds are available for each release. The set of representative structures can be viewed online along with information about the resolution, experimental method, molecule name, species, and number of equivalent structures. Releases can also be downloaded and parsed by computer programs. Some weeks, when many new structures are released, the representative set listing can be delayed because of the time it takes to compute all-against-all geometric comparisons within large equivalences classes such as Thermus thermophilus small ribosomal subunit.
Individual chains are named in the format XXXX|M|C, where XXXX is the PDB entry, M is the model number, usually 1, and C is the chain identifier, one to four characters. IFEs are made up of individual chains linked with + signs.
Unique and stable ids are assigned to all equivalence classes of structure files. Representative sets are updated automatically every week, and a versioning system is implemented to provide independent access to data snapshots.
Notice PDB files with no full nucleotides are not included in the representative sets. For example, see PDB 1DV4.
Please use the following citation when using this resource:
Leontis, N. B., & Zirbel, C. L. (2012). Nonredundant 3D Structure Datasets for RNA Knowledge Extraction and Benchmarking. In RNA 3D Structure Analysis and Prediction N. Leontis & E. Westhof (Eds.), (Vol. 27, pp. 281–298). Springer Berlin Heidelberg. doi:10.1007/978-3-642-25740-7_13