Skip to content

SRA Toolkit Configuration

In order to use the tools provided with the NCBI (National Center for Biotechnology Information) SRA Toolkit (Sequence Read Archive Toolkit) to access public and, optionally, controlled-access data in the cloud, each user must individually configure the toolkit. If you are unsure whether you have already configured SRA Toolkit, there are two easy ways to check.

Check for settings file

If you have previously configured SRA Toolkit, it should have created a hidden directory that contains a user-settings file, typically located at ~/.ncbi/user-settings.mfkg. In that case, the following command will return the contents of your settings file, providing information such as cache locations or remote access preferences:

$ cat ~/.ncbi/user-settings.mkfg

If you have not yet configured SRA Toolkit, you will see something similar to:

$ cat ~/.ncbi/user-settings.mkfg
cat: /users/x/y/<netid>/.ncbi/user-settings.mkfg: No such file or directory

Run a simple test

If the directory exists, you can confirm that your configuration works by running a basic fastq-dump. You will need to load SRA Toolkit's compiler dependency module, gcc, followed by the sratoolkit module, and then run the fastq-dump command as follows:

$ module load gcc/13.3.0-xp3epyt
$ module load sratoolkit/3.0.0-y2rspiu

$ fastq-dump --stdout -X 2 SRR390728

After proper configuration, you should receive the following output:

Read 2 spots for SRR390728 Written 2 spots for SRR390728
@SRR390728.1 1 length=72
CATTCTTCACGTAGTTCTCGAGCCTTGGTTTTCAGCGATGGAGAATGACTTTGACAAGCTGAGAGAAGNTNC
+SRR390728.1 1 length=72
;;;;;;;;;;;;;;;;;;;;;;;;;;;9;;665142;;;;;;;;;;;;;;;;;;;;;;;;;;;;;96&&&&(
@SRR390728.2 2 length=72
AAGTAGGTCTCGTCTGTGTTTTCTACGAGCTTGTGTTCCAGCTGACCCACTCCCTGGGTGGGGGGACTGGGT
+SRR390728.2 2 length=72
;;;;;;;;;;;;;;;;;4;;;;3;393.1+4&&5&&;;;;;;;;;;;;;;;;;;;;;<9;<;;;;;464262

Users who have not yet completed SRA Toolkit configuration will, instead, see

This sra toolkit installation has not been configured.
Before continuing, please run: vdb-config --interactive
For more information, see https://www.ncbi.nlm.nih.gov/sra/docs/sra-cloud/

Interactive configuration

Some SRA Toolkit commands generate large temporary files. By default, SRA Toolkit writes these temporary, or cache, files to a subdirectory of your $HOME space (~/ncbi). We recommend, instead, using a directory within your scratch space, where users have more storage.

Note, while it may appear that your scratch directory is a subdirectory of your $HOME directory, it actually resides on a different filesystem. A symbolic link to scratch has been added, for convenience, to each user's $HOME directory. The physical location to your scratch space is, in fact, /gpfs2/scratch/<netid>.

Before running the configuration command, create a cache directory in your scratch space. The cache directory must be initially empty.

$ mkdir ~/scratch/sra-cache

Configuration is done interactively from a terminal on the VACC via the command vdb-config -i. Make sure the required modules are loaded and then launch the configuration tool.

$ module load  gcc/13.3.0-xp3epyt
$ module load sratoolkit/3.0.0-y2rspiu

$ vdb-config -i

When you enter this command, you will see a screen like the one shown below. Within this configuration menu, you can operate the buttons either by pressing the letter highlighted in red, or by pressing the tab key until the desired button is reached and then pressing the space bar or the enter key.

SRA Toolkit configuration screen
The SRA Toolkit configuration interface.

The Enable Remote Access option on the MAIN screen should be selected (X) by default. You should not change this setting, as remote access is almost always desired.

Next, navigate to the CACHE tab by pressing the c key or by pressing the tab key until the red cursor lands on CACHE and press enter. At the top of this section, "enable local file-caching" should be selected by default and does not need to be changed. Now, either press the key for the letter o or keep pressing tab until the red cursor is on the field to choose the "location of user-repository" and hit enter. Tab until the red cursor is at [ Goto ] and press enter again. A new pop-up window will appear. It should show the path to your $HOME directory in the first field.

Delete the existing file path (e.g., /users/m/j/mjohns89) via the delete or backspace key and enter the absolute, physical path of your new sra-cache directory. Then, tab over to [ OK ] and press enter. See below for an image demonstrating how this would look for user mjohns89.

Cache path entry screen
Setting the SRA Toolkit cache path.

The "select directory" pop-up should now indicate the correct cache directory, including the full path. Tab to the [ OK ] button again and hit enter. A window asking you to confirm the change will appear. Select [ yes ] and hit enter.

If you wish to make any other customized configurations, such as providing cloud credentials, you can do that now. The last step is to save the changes and exit. Either tab over to [ save ] and hit enter or use the shortcut by pressing the s key. Select [ ok ] and then exit either by using the x shortcut key or tab to [ exit ] and press enter.

Settings are saved and persist across sessions, so configuration is only necessary prior to the first time you use SRA Toolkit.

SRA Toolkit is ready for use. You may now wish to perform the simple test described above to confirm that a basic fastq-dump command produces the expected output.

Cache maintenance

Either periodically, or each time you're done downloading and processing sra files, you may wish to clean the cache to clear up storage space. You can do this via the cache-mgr command. It's a good idea to use the report options before emptying the cache. Below, the -r option tells cache-mgr to report objects in cache, while the -t provides additional detail.

$ cache-mgr -rt
source: /gpfs2/scratch/mjohns89/sra-cache

-----------------------------------
/gpfs2/scratch/mjohns89/sra-cache/sra/SRR35933485.sra complete file of 54,004,679 bytes
/gpfs2/scratch/mjohns89/sra-cache/sra/SRR649944.sra complete file of 142,605 bytes
-----------------------------------
0 cached file(s)
2 complete file(s)
54,147,284 bytes in cached files
54,147,284 bytes used in cached files
0 lock files

If you would like to clear all cached files from the current cache location, you can use the --clear option (or the shorthand version -c).

$ cache-mgr --clear
-----------------------------------
2 files removed
0 directories removed
54,147,284 bytes removed

A subsequent report shows

$ cache-mgr -r
-----------------------------------
0 cached file(s)
0 complete file(s)
0 bytes in cached files
0 bytes used in cached files
0 lock files

Alternatively, you can specify individual files to remove by navigating to the cache directory and specifying the full file name as an argument to the -c option. You may also use a wildcard (*) which can be used to remove one or more files.

$ cd ~/scratch/sra-cache/sra
$ cache-mgr -c SRR35933485.sra
$ cache-mgr -c SRR649*