AlphaFold reference dataset

Description

In order to facilitate the use of public datasets in 'Counting on me' , several commonly used foreign datasets about AlphaFold reference dataset are mirrored and backed up here:

1. UniRef30

Introduction: UniRef30 is a 30% sequence identity clustered database based on UniRef100.

Website: https://www.uniprot.org/help/uniref

Command:

ssh username@data.hpc.sjtu.edu.cn
cp /lustre/share/scidata/UniRef30202103.tar.gz ~/target_position/

Citaition:

Mirdita, M., Schütze, K., Moriwaki, Y. et al. ColabFold: making protein folding accessible to all. Nat Methods 19, 679–682 (2022). https://doi.org/10.1038/s41592-022-01488-1

2. BFD (Big Fasta Database)

Introduction: BFD is one of the largest publicly available collections of protein families. It consists of 65,983,866 families represented as MSAs and hidden Markov models (HMMs) covering 2,204,359,010 protein sequences from reference databases, metagenomes and metatranscriptomes.

Article: https://www.nature.com/articles/s41586-021-03819-2

Command:

ssh username@data.hpc.sjtu.edu.cn
cp /lustre/share/scidata/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz ~/

Citation:

Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). https://doi.org/10.1038/s41586-021-03819-2

3. PDB

Introduction: The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules, such as proteins and nuleic acids.

Website: https://www.rcsb.org/

Command:

ssh username@data.hpc.sjtu.edu.cn
cp /lustre/share/scidata/pdb70_from_mmcif_200401.tar.gz ~/

Citation:

H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne, The Protein Data Bank (2000) Nucleic Acids Research 28: 235-242 https://doi.org/10.1093/nar/28.1.235.

4. Mgnify

Introduction: MGnify provides a free to use platform for the assembly, analysis and archiving of microbiome data derived from sequencing microbial populations that are present in particular environments.

Website: https://www.ebi.ac.uk/metagenomics

Command:

ssh username@data.hpc.sjtu.edu.cn
cp /lustre/share/scidata/mgyclusters.fa ~/target_position

Citation:

Richardson L, Allen B, Baldi G, et al. MGnify: the microbiome sequence data analysis resource in 2023[J]. Nucleic Acids Research, 2023, 51(D1): D753-D759.

Files

Files (159.8 GB)

Name	Size	Actions
UniRef30_2021_03.tar.gz md5:c5c8575beafe88a26b2b5be21a816f8d	56.4 GB	Download
pdb70_from_mmcif_200401.tar.gz md5:d41d9127910127bb538213676223fb6e	34.7 GB	Download
mgy_clusters.fa md5:3121a5e8d5896226c02ad0ee4714df36	68.6 GB	Download