Use Case #2: Download Files in Bulk Using the Command Line Client
In Use Case #1: Find and Download Files Associated With a Selected Study, we look at how to download files from the web using embedded tools within the portal. However, this isn’t the most efficient option if you are downloading a large number of files or files large in size.
An alternative to downloading files through the web is the use of programmatic clients that directly access files in Synapse, where portal data is stored. In addition to being more efficient for large files or large numbers of files, using programmatic clients also promotes reproducibility by recording which data were used for analysis.
Programmatic clients include command line, R, and Python.
An important consideration when determining which Synapse programmatic client to use to download data is download speed. The command line and Python synapseclient
support multithreaded download, and will provide faster download speeds than the Synapse R client.
In this use case, we’ll focus on the command line client. The Synapse command line client synapseclient
can be used to download all data and file annotations from the portal with a single command. This protocol can be used instead of steps 5-7 in Use Case #1. You still need to gain access to the study data, following step 4 from Use Case #1.
If you want to use Python or R clients, follow the steps outlined in the Synapse documentation.
Here’s how to download files in bulk using the command line client:
Install Python 3 (the command line
synapseclient
is installed with the Synapse Python client, so Python 3 is required to install thesynapseclient
package).Install the
synapseclient
package following these steps.Login to Synapse following these steps.
⚠ If working on your personal computer, you may store your credentials locally by including the--rememberMe
argument (shown below) to allow automatic authentication with future Synapse interactions. We recommend doing this to prevent accidentally sharing your password while sharing analytical code. In almost all cases, your Synapse API key is more secure than your password, and we recommend you use it to log in. Find your API key in your user profile—read more on that here.CODEsynapse login -u <Synapse username> -p <API key> --rememberMe
In the AD Knowledge Portal, click the Explore tab, followed by the Data subtab. Then, click the Download Options icon, followed by Programmatic Options from the dropdown menu.
The command
synapse get
with the-q
argument downloads files from the entirety of the portal data that meet the specified condition. See thesynapse get
documentation for more detail.
In this use case example, we want to download all processed and metadata files from the MC-CAA study. To do so, we would execute the following command, which you would do from the directory where you would like to store the files.NONEsynapse get -q "SELECT * FROM syn11346063 WHERE (("study" = 'MC-CAA') AND ("dataSubtype" = 'processed' OR "dataSubtype" = 'metadata'))"
In your working directory, you will find a
SYNAPSE_TABLE_QUERY_###.csv
file that lists the annotations associated with each downloaded file. This is the same information that you can obtain through the web interface using the Export Table option from Use Case #1, step 7. This file provides helpful experimental details relevant to how the data were processed, as well as important details about the file itself (including the file version number).