AlphaFold reference dataset
- Creators
- sjtu
Description
In order to facilitate the use of public datasets in 'Counting on me' , several commonly used foreign datasets about AlphaFold reference dataset are mirrored and backed up here:
1. UniRef30
Introduction: UniRef30 is a 30% sequence identity clustered database based on UniRef100.
Website: https://www.uniprot.org/help/uniref
Command:
ssh username@data.hpc.sjtu.edu.cn
cp /lustre/share/scidata/UniRef30202103.tar.gz ~/target_position/
Citaition:
Mirdita, M., Schütze, K., Moriwaki, Y. et al. ColabFold: making protein folding accessible to all. Nat Methods 19, 679–682 (2022). https://doi.org/10.1038/s41592-022-01488-1
2. BFD (Big Fasta Database)
Introduction: BFD is one of the largest publicly available collections of protein families. It consists of 65,983,866 families represented as MSAs and hidden Markov models (HMMs) covering 2,204,359,010 protein sequences from reference databases, metagenomes and metatranscriptomes.
Article: https://www.nature.com/articles/s41586-021-03819-2
Command:
ssh username@data.hpc.sjtu.edu.cn
cp /lustre/share/scidata/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz ~/
Citation:
Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). https://doi.org/10.1038/s41586-021-03819-2
3. PDB
Introduction: The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules, such as proteins and nuleic acids.
Website: https://www.rcsb.org/
Command:
ssh username@data.hpc.sjtu.edu.cn
cp /lustre/share/scidata/pdb70_from_mmcif_200401.tar.gz ~/
Citation:
H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne, The Protein Data Bank (2000) Nucleic Acids Research 28: 235-242 https://doi.org/10.1093/nar/28.1.235.
4. Mgnify
Introduction: MGnify provides a free to use platform for the assembly, analysis and archiving of microbiome data derived from sequencing microbial populations that are present in particular environments.
Website: https://www.ebi.ac.uk/metagenomics
Command:
ssh username@data.hpc.sjtu.edu.cn
cp /lustre/share/scidata/mgyclusters.fa ~/target_position
Citation:
Richardson L, Allen B, Baldi G, et al. MGnify: the microbiome sequence data analysis resource in 2023[J]. Nucleic Acids Research, 2023, 51(D1): D753-D759.