Accessing Globus Data

Accessing Data

 

Most datasets hosted by GHub can be accessed using the quick and easy Globus web application. Globus is a fast and secure method to move, share and discover data. It lets you access multiple sources of data (HPC cluster, Cloud storage, Local Workstation or Laptop) by allowing you to use your organization's authentication to connect to the endpoints that provide access to the data storage, all in a web browser.

GHub is partnered with UB CCR to provide access to large datasets. These datasets are described in detail on our Browse Data page, and this page gives instructions on how to use Globus to download Ghub data in general.

 

 

Globus Quickstart

 

  1. Set up accounts with Ghub and Globus

     

 

 

In order to transfer data from datasets hosted by Ghub, you will need an account with Ghub and an account with Globus. You can sign up for a Ghub account by clicking here. If you don’t have an account yet, click on the “Create Account” link near the bottom of the page. 

 

 

 

 

Globus offers several options for creating accounts, and the best idea is to use your university or research account to log in. Click here to go to the Globus website that will help you set up your account.

https://lh3.googleusercontent.com/2iI9-vbZQTTAbzmFEbFVBa0nGyuLTvvhvSXAqnmPY1T3u66pN6abMl-BwhRLab50RFVAqOCARM_MLONyaLNpbDVk6e9TXIQnC11Ez7GoMkPcnnDSfNOYn63rMlTK9LF_DOfiSUII

2.            Read about the datasets and find one to download

 

 

 

Once you have a Ghub account, log into the website (click here), hover over the “Data” menu item at the top of the page and select “Browse Data” from the menu that drops down.

 

 

 

 

The listing for each dataset contains a description of the creation and properties of the data. The instructions for downloading each one, including links to Globus Endpoints hosting the data, are in a document you can access from the black “Download” button on the top right of each data description page.

 

 




3.            Log in to Globus to browse the data files

 

 

 

Most of the data download documents include a link to the globus endpoint hosting that data. Clicking on the link will take you to the Globus transfer app website and ask you to log in. Log in with the account from step 1 and you should be taken to the data listing in the Globus file manager.

 

 

 

 

Another option for finding data from Ghub is to search directly from the Globus app. In the Globus File Manager screen, typing “Ghub” into the Collection text area will bring up a list of all of the Ghub endpoints containing data hosted by the project.

 

 

If the data resource download document does not contain a link or Globus download information, you may be able to find the data by searching for it this way.



4.            Connect to an endpoint to receive the data

 

 

 

In the Globus web app, clicking on the “split screen” icon in the top right corner will open a second endpoint file browser so you can set up the location to which you want to transfer your Ghub data. 

 

 

 

 

Typing in the name of your organization in the Collection search box may reveal that your computing facilities have already set up a Globus endpoint that you can use to transfer data to your space. If so, choose the collection where you would like to transfer your data to, so that one side of the screen has the Ghub data and the other has the destination directory visible. 

 

 

 

 

If your organization does not yet have Globus access or you want to transfer data to your own work laptop, click here for instructions on how to set up a Globus endpoint to accept your data. Additionally, check here for general How-To instructions



5.            Initiate the transfer

 

 

 

From the Globus web app, clicking on the rectangle just above the Ghub data will select all of the files in the data directory. Alternatively, you can click the box to the left of the files you want to transfer to specify exactly which files you want. Make sure the destination folder is still selected in the other half of the window. When everything is checked, click on the blue Start (>) button above the Ghub data you want to transfer. This will submit a data transfer job request and a green number will appear next to the Activity section of the left bar showing the transfer is starting.

 

 

 

 

You can have as many file transfer jobs going at once as you would like! You do not have to keep the Globus web app open for the transfers to complete. You may browse other data, start other transfers, or work on something else entirely while the transfer progresses. 

 

 

To watch the progress of the transfer, click on the “Activity” section of the blue left hand bar, and select which transfer job you would like more information for from the list. Clicking on a job will open the detail page for that job with information on the number of files, the size of the data, and progress of the transfer.




6.            Validate received data

 

 

Once Globus determines that the transfer is complete, it will show up as complete in the activity detail and send an email to notify. Navigate to the directory where you expected the data and check to make sure the files are there, and their sizes are appropriate. 

 

 

If you have issues or questions, please contact us by submitting a VHub/GHub or CCR ticket.

 

 

Set up a Globus endpoint on a local Mac or Unix machine

 

 

In order to download/upload files to and from a user's computer you will need Globus personal connect installed, set up and running in the background while accessing the Globus interface.

Globus connect personal documentation

 

Mac OS

Windows

Linux

If you have issues or questions, please contact us by submitting a VHub/GHub or CCR ticket.

 

Basics of the Globus CLI

The Globus CLI supports a rich set of commands for terminal use or for scripting. Users can list endpoints, and if so authorized they can download, upload, and list endpoint contents. A few basic commands are shown here to get you started. Refer to the Globus documentation for comprehensive documentation.

Selected Globus CLI commands
Task Command  
check your current identity
globus whoami
 
show all GHub endpoints
globus endpoint search "ghub"
 
list contents of an endpoint
globus ls <endpoint id>
 
log out
globus logout
 

 

Example session

Below is shown a brief example globus CLI session. The session shows a successful authentication, a session information query, "GHub" endpoints returned from a search, and a successful listing of the contents of one such endpoint.

typical globus cli session

Note that the 'globus search' command returns all endpoints that satisfy search criteria--even those on which you lack permissions. For example, here I try to list contents of an endpoint to which I have no access:

An expected Transfer API Error

Globus CLI Documentation

CCR's Globus CLI documentation shows selected additional commands (such as download and scripting information):

https://ubccr.freshdesk.com/support/solutions/articles/13000088456-globus-command-line-interface-cli-

Globus documentation provides comprehensive information about installing, using, and scripting the Globus CLI:

Not registered? Join us on GHub! Register now.