Published January 1, 2024 | Version v2
Dataset Open

AlphaFold reference dataset

sjtu

Description

In order to facilitate the use of public datasets in 'Counting on me' , several commonly used foreign datasets about AlphaFold reference dataset are mirrored and backed up here:

1. UniRef30

Introduction: UniRef30 is a 30% sequence identity clustered database based on UniRef100.

Website: https://www.uniprot.org/help/uniref

Command:

ssh username@data.hpc.sjtu.edu.cn

cp /lustre/share/scidata/UniRef30202103.tar.gz ~/target_position/

Citaition:

Mirdita, M., Schütze, K., Moriwaki, Y. et al. ColabFold: making protein folding accessible to all. Nat Methods 19, 679–682 (2022). https://doi.org/10.1038/s41592-022-01488-1

2. BFD (Big Fasta Database)

Introduction: BFD is one of the largest publicly available collections of protein families. It consists of 65,983,866 families represented as MSAs and hidden Markov models (HMMs) covering 2,204,359,010 protein sequences from reference databases, metagenomes and metatranscriptomes.

Article: https://www.nature.com/articles/s41586-021-03819-2

Command:

ssh username@data.hpc.sjtu.edu.cn

cp /lustre/share/scidata/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz ~/ 

Citation:

Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). https://doi.org/10.1038/s41586-021-03819-2

3. PDB

Introduction: The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules, such as proteins and nuleic acids.

Website: https://www.rcsb.org/

Command:

ssh username@data.hpc.sjtu.edu.cn

cp /lustre/share/scidata/pdb70_from_mmcif_200401.tar.gz ~/

Citation:

H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne, The Protein Data Bank (2000) Nucleic Acids Research 28: 235-242 https://doi.org/10.1093/nar/28.1.235.

4. Mgnify

Introduction: MGnify provides a free to use platform for the assembly, analysis and archiving of microbiome data derived from sequencing microbial populations that are present in particular environments. 

Website: https://www.ebi.ac.uk/metagenomics

Command:

ssh username@data.hpc.sjtu.edu.cn

cp /lustre/share/scidata/mgyclusters.fa ~/target_position

Citation:

Richardson L, Allen B, Baldi G, et al. MGnify: the microbiome sequence data analysis resource in 2023[J]. Nucleic Acids Research, 2023, 51(D1): D753-D759.

Files

Files (159.8 GB)
Name Size
md5:c5c8575beafe88a26b2b5be21a816f8d
56.4 GB Download
md5:d41d9127910127bb538213676223fb6e
34.7 GB Download
md5:3121a5e8d5896226c02ad0ee4714df36
68.6 GB Download
Created:
January 2, 2024
Modified:
January 2, 2024