SRA Toolkit Configuration
In order to use the tools provided with the NCBI (National Center for Biotechnology Information) SRA Toolkit (Sequence Read Archive Toolkit) to access public and, optionally, controlled-access data in the cloud, each user must individually configure the toolkit. If you are unsure whether you have already configured SRA Toolkit, there are two easy ways to check.
Check for settings file¶
If you have previously configured SRA Toolkit, it should have created
a hidden directory that contains a user-settings file, typically
located at ~/.ncbi/user-settings.mfkg. In that case, the following
command will return the contents of your settings file, providing
information such as cache locations or remote access preferences:
$ cat ~/.ncbi/user-settings.mkfg
If you have not yet configured SRA Toolkit, you will see something similar to:
$ cat ~/.ncbi/user-settings.mkfg
cat: /users/x/y/<netid>/.ncbi/user-settings.mkfg: No such file or directory
Run a simple test¶
If the directory exists, you can confirm that your configuration works
by running a basic fastq-dump. You will need to load SRA Toolkit's
compiler dependency module, gcc, followed by the sratoolkit
module, and then run the fastq-dump command as follows:
$ module load gcc/13.3.0-xp3epyt
$ module load sratoolkit/3.0.0-y2rspiu
$ fastq-dump --stdout -X 2 SRR390728
After proper configuration, you should receive the following output:
Read 2 spots for SRR390728 Written 2 spots for SRR390728
@SRR390728.1 1 length=72
CATTCTTCACGTAGTTCTCGAGCCTTGGTTTTCAGCGATGGAGAATGACTTTGACAAGCTGAGAGAAGNTNC
+SRR390728.1 1 length=72
;;;;;;;;;;;;;;;;;;;;;;;;;;;9;;665142;;;;;;;;;;;;;;;;;;;;;;;;;;;;;96&&&&(
@SRR390728.2 2 length=72
AAGTAGGTCTCGTCTGTGTTTTCTACGAGCTTGTGTTCCAGCTGACCCACTCCCTGGGTGGGGGGACTGGGT
+SRR390728.2 2 length=72
;;;;;;;;;;;;;;;;;4;;;;3;393.1+4&&5&&;;;;;;;;;;;;;;;;;;;;;<9;<;;;;;464262
Users who have not yet completed SRA Toolkit configuration will, instead, see
This sra toolkit installation has not been configured.
Before continuing, please run: vdb-config --interactive
For more information, see https://www.ncbi.nlm.nih.gov/sra/docs/sra-cloud/
Interactive configuration¶
Some SRA Toolkit commands generate large temporary files. By default,
SRA Toolkit writes these temporary, or cache, files to a subdirectory
of your $HOME space (~/ncbi). We recommend, instead, using a
directory within your scratch space, where users have more storage.
Note, while it may appear that your scratch directory is a
subdirectory of your $HOME directory, it actually resides on a
different filesystem. A symbolic link to scratch has been added, for
convenience, to each user's $HOME directory. The physical location
to your scratch space is, in fact, /gpfs2/scratch/<netid>.
Before running the configuration command, create a cache directory in your scratch space. The cache directory must be initially empty.
$ mkdir ~/scratch/sra-cache
Configuration is done interactively from a terminal on the VACC via
the command vdb-config -i. Make sure the required modules are
loaded and then launch the configuration tool.
$ module load gcc/13.3.0-xp3epyt
$ module load sratoolkit/3.0.0-y2rspiu
$ vdb-config -i
When you enter this command, you will see a screen like the one shown below. Within this configuration menu, you can operate the buttons either by pressing the letter highlighted in red, or by pressing the tab key until the desired button is reached and then pressing the space bar or the enter key.
The Enable Remote Access option on the MAIN screen should be selected (X) by default. You should not change this setting, as remote access is almost always desired.
Next, navigate to the CACHE tab by pressing the c key or by pressing
the tab key until the red cursor lands on CACHE and press enter. At
the top of this section, "enable local file-caching" should be
selected by default and does not need to be changed. Now, either
press the key for the letter o or keep pressing tab until the red
cursor is on the field to choose the "location of user-repository" and
hit enter. Tab until the red cursor is at [ Goto ] and press enter
again. A new pop-up window will appear. It should show the path to
your $HOME directory in the first field.
Delete the existing file path (e.g., /users/m/j/mjohns89) via the
delete or backspace key and enter the absolute, physical path of your
new sra-cache directory. Then, tab over to [ OK ] and press
enter. See below for an image demonstrating how this would look for
user mjohns89.
The "select directory" pop-up should now indicate the correct cache
directory, including the full path. Tab to the [ OK ] button again
and hit enter. A window asking you to confirm the change will appear.
Select [ yes ] and hit enter.
If you wish to make any other customized configurations, such as
providing cloud credentials, you can do that now. The last step is to
save the changes and exit. Either tab over to [ save ] and hit
enter or use the shortcut by pressing the s key. Select [ ok ]
and then exit either by using the x shortcut key or tab to [ exit
] and press enter.
Settings are saved and persist across sessions, so configuration is only necessary prior to the first time you use SRA Toolkit.
SRA Toolkit is ready for use. You may now wish to perform the simple
test described above to confirm that
a basic fastq-dump command produces the expected output.
Cache maintenance¶
Either periodically, or each time you're done downloading and
processing sra files, you may wish to clean the cache to clear up
storage space. You can do this via the cache-mgr command. It's a
good idea to use the report options before emptying the cache. Below,
the -r option tells cache-mgr to report objects in cache, while
the -t provides additional detail.
$ cache-mgr -rt
source: /gpfs2/scratch/mjohns89/sra-cache
-----------------------------------
/gpfs2/scratch/mjohns89/sra-cache/sra/SRR35933485.sra complete file of 54,004,679 bytes
/gpfs2/scratch/mjohns89/sra-cache/sra/SRR649944.sra complete file of 142,605 bytes
-----------------------------------
0 cached file(s)
2 complete file(s)
54,147,284 bytes in cached files
54,147,284 bytes used in cached files
0 lock files
If you would like to clear all cached files from the current cache
location, you can use the --clear option (or the shorthand version
-c).
$ cache-mgr --clear
-----------------------------------
2 files removed
0 directories removed
54,147,284 bytes removed
A subsequent report shows
$ cache-mgr -r
-----------------------------------
0 cached file(s)
0 complete file(s)
0 bytes in cached files
0 bytes used in cached files
0 lock files
Alternatively, you can specify individual files to remove by
navigating to the cache directory and specifying the full
file name as an argument to the -c option. You may also use a
wildcard (*) which can be used to remove one or more files.
$ cd ~/scratch/sra-cache/sra
$ cache-mgr -c SRR35933485.sra
$ cache-mgr -c SRR649*