There is a newer version of the record available.

Published November 30, 2023 | Version v1
Dataset

AlphaFold reference dataset

sjtu

Description

In order to facilitate the use of public datasets in 'Counting on me' , several commonly used foreign datasets about AlphaFold reference dataset are mirrored and backed up here:

1. UniRef30

Introduction: UniRef30 is a 30% sequence identity clustered database based on UniRef100.

Website: https://www.uniprot.org/help/uniref

Command:

ssh username@data.hpc.sjtu.edu.cn

cp /lustre/share/scidata/UniRef30202103.tar.gz ~/target_position/

Citaition:

Mirdita, M., Schütze, K., Moriwaki, Y. et al. ColabFold: making protein folding accessible to all. Nat Methods 19, 679–682 (2022). https://doi.org/10.1038/s41592-022-01488-1

2. BFD (Big Fasta Database)

Introduction: BFD is one of the largest publicly available collections of protein families. It consists of 65,983,866 families represented as MSAs and hidden Markov models (HMMs) covering 2,204,359,010 protein sequences from reference databases, metagenomes and metatranscriptomes.

Article: https://www.nature.com/articles/s41586-021-03819-2

Command:

ssh username@data.hpc.sjtu.edu.cn

cp /lustre/share/scidata/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz ~/ 

Citation:

Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). https://doi.org/10.1038/s41586-021-03819-2

3. PDB

Introduction: The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules, such as proteins and nuleic acids.

Website: https://www.rcsb.org/

Command:

ssh username@data.hpc.sjtu.edu.cn

cp /lustre/share/scidata/pdb70_from_mmcif_200401.tar.gz ~/

Citation:

H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne, The Protein Data Bank (2000) Nucleic Acids Research 28: 235-242 https://doi.org/10.1093/nar/28.1.235.

4. Mgnify

Introduction: MGnify provides a free to use platform for the assembly, analysis and archiving of microbiome data derived from sequencing microbial populations that are present in particular environments. 

Website: https://www.ebi.ac.uk/metagenomics

Command:

ssh username@data.hpc.sjtu.edu.cn

cp /lustre/share/scidata/mgyclusters.fa ~/target_position

Citation:

Richardson L, Allen B, Baldi G, et al. MGnify: the microbiome sequence data analysis resource in 2023[J]. Nucleic Acids Research, 2023, 51(D1): D753-D759.

Created:
November 30, 2023
Modified:
December 28, 2023