| 1 | = BIOS data location and access overview = |
| 2 | |
| 3 | Processed files are stored at the BIOS VM. BIOS VM is a virtual machine, a digital computer, running at SURFsara's HPC cloud. It behaves like a normal Linux server would, though it can be connected to graphically. |
| 4 | |
| 5 | BIOS Metadatabase can be accessed via a web browser or any client program supporting HTTPS. |
| 6 | |
| 7 | Raw files are stored at [http://doc.grid.surfsara.nl/en/latest/Pages/Advanced/grid_storage.html SURFsara's Grid]. SURFsara's Grid is an online data storage system. Accessing Grid requires a certain level of technical skills and to follow [http://doc.grid.surfsara.nl/en/latest/ a registration procedure]. |
| 8 | |
| 9 | [[Image(BIOSdataInfrastructure.png, 600px)]] |
| 10 | |
| 11 | = BIOS virtual machine provided by SURFsara = |
| 12 | |
| 13 | For safety and privacy reasons, BIOS data (genome, transcriptome, methylome and phenome) is only accessible for downstream analysis at a SURFsara virtual machine (VM). The BIOS VM is managed by Martijn Vermaat and Leon Mei. Since the resources and capacity of this VM are limited, it should only be used for downstream analysis. If you want to work on many BAM files or similar expensive analysis you would normally use a cluster for, it's probably better to get acquainted with working on the grid directly. If you are not sure contact Martijn or Leon. |
| 14 | |
| 15 | The current test VM runs these specs: |
| 16 | * 16 processors |
| 17 | * 128 Gb RAM |
| 18 | * 4.5Tb disk space mounted at /virdir. This is the place you could keep your analysis data. The files in /virdir/Backup folder will be backed up about once per month. The files in /virdir/Scratch are not backed up. |
| 19 | * 2GB soft limit and 3GB hard limit per user at /home |
| 20 | |
| 21 | == BIOS VM Access == |
| 22 | |
| 23 | To get access, please send a request to Leon Mei (`h.mei[at]lumc.nl`) or Martijn Vermaat (`m.vermaat.hg[at]lumc.nl`) with your '''public''' SSH key ([wiki:FgSshKey instructions]). |
| 24 | |
| 25 | For remote access from a Linux or Mac OSX terminal, type |
| 26 | {{{ |
| 27 | ssh username@bios-vm.bbmrirp3-lumc.vm.surfsara.nl |
| 28 | }}} |
| 29 | where your private SSH key is in the standard location `~/.ssh/id_rsa` (alternatively, specify it with `-i`). |
| 30 | |
| 31 | For terminal access from Windows, use the [http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY] tool and configure the VM IP address, your username, and your private SSH key. |
| 32 | |
| 33 | For graphical access from windows use [http://mobaxterm.mobatek.net/ MobaXterm] as advised by SURF in the HPC cloud documentation (https://doc.hpccloud.surfsara.nl/access-your-VM). |
| 34 | |
| 35 | Alternatively for graphical access from Windows or Mac OSX, use [http://wiki.x2go.org/doku.php X2Go] and configure the VM IP address, your username, your private SSH key and the session type/desktop manager (Gnome). Or, use a remote desktop connection client (for mac: http://www.microsoft.com/nl-nl/download/details.aspx?id=18140). |
| 36 | |
| 37 | [wiki:FgSshKey Step by step instructions for using a public/private SSH pair for access to the VM] |
| 38 | |
| 39 | [wiki:FgConnectTroubleshooting Connection troubleshooting page (under construction)] |
| 40 | |
| 41 | == Rstudio server == |
| 42 | There is a Rstudio server running on the BIOS VM: http://bios-vm.bbmrirp3-lumc.vm.surfsara.nl:8787 |
| 43 | |
| 44 | You could log in using your username and password as your ssh session. |
| 45 | |
| 46 | == UCSC Genome Browser tracks == |
| 47 | |
| 48 | Viewing RP3 data in the UCSC Genome Browser can be done by using the [wiki:FgStorageInTheCloud#VirDir WWW export directory on the virdir] and selecting the exported URLs as custom tracks. |
| 49 | |
| 50 | Please note that no privacy sensitive data should be stored here, as it will be world-readable. |
| 51 | |
| 52 | Example sessions: |
| 53 | |
| 54 | * [http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=martijnvermaat&hgS_otherUserSessionName=rp3 Coverage tracks for 10 random samples, meta exon track, and the PolyA binning track] |
| 55 | |
| 56 | = Grid SRM access from the BIOS VM = |
| 57 | In case you need to have access to some raw BIOS data (e.g., RNAseq, methylation), you will have to download them from the Grid SRM storage to the BIOS VM. Here are instructions on how to do it. |
| 58 | |
| 59 | '''Note:''' Requesting access to the SRM takes quite some time and using the SRM itself is not the easiest thing to learn. As such, if there already is someone at your institute with access and experience using the SRM, it might be faster and easier to ask that person for help. |
| 60 | |
| 61 | == Grid SRM Access == |
| 62 | |
| 63 | Before proceeding, follow the steps in [wiki:FgObtainingGridAccess Obtaining access to grid infrastructure]. |
| 64 | |
| 65 | === Prepare a proxy === |
| 66 | |
| 67 | To download data from the Grid SRM to the BIOS VM you'll need a proxy and your keys (the grid certificate). You should have access to a UI to start a proxy. For example gb-ui-lumc.lumc.nl, ui.lsg.psy.vu.nl or another site. |
| 68 | |
| 69 | * On the UI there should be a `.globus` folder in your home dir, that contains the grid certificate. Copy the `.globus` directory from your local home folder to the UI home folder. Make sure the permissions are set accordingly. (log into an UI and issue the commands `chmod 644 usercert.pem` and `chmod 400 userkey.pem`). These files don't need to be renewed. |
| 70 | {{{ |
| 71 | .globus/: |
| 72 | total 8 |
| 73 | -rw-r--r-- 1 mgalen mgalen 1769 Aug 14 16:55 usercert.pem |
| 74 | -r-------- 1 mgalen mgalen 1751 Aug 14 16:55 userkey.pem |
| 75 | }}} |
| 76 | |
| 77 | * The proxy can be started by logging into an UI and use `startGridSession`: |
| 78 | {{{ |
| 79 | startGridSession bbmri.nl:/bbmri.nl/RP3 |
| 80 | }}} |
| 81 | |
| 82 | * This creates your own x509 in the `/tmp` dir on the UI which looks something like this. |
| 83 | {{{ |
| 84 | -rw------- 1 mgalen 6.1K Aug 27 09:55 x509up_u40208 |
| 85 | }}} |
| 86 | * You may have to change the permissions of this file using `chmod 644 x509up_u40208`. Copy this file to a place at the BIOS VM for later use. (Maybe to `/tmp` also.) Make sure you copy the x509 file associated with your username. This is valid for 7 days, you need to renew this weekly. |
| 87 | |
| 88 | * Now log in the BIOS VM and use this command to fetch the file you just created on the UI to the BIOS VM. |
| 89 | {{{ |
| 90 | scp mgalen@uimd.grid.sara.nl:/tmp/x509up_u1234 /tmp (replace 'uimd.grid.sara.nl' with the address of your UI) |
| 91 | }}} |
| 92 | |
| 93 | === Downloading files === |
| 94 | |
| 95 | * Once these files are in place, you can copy data from the Grid SRM to the BIOS VM using curl. For example, login to the VM and issue the following command, where `-E` points to the path where you put the proxy file. Don't forget to redirect the output from curl to a local filename. |
| 96 | {{{ |
| 97 | mgalen@cloud-KVM:~$ curl --CApath /etc/grid-security/certificates/ -E /tmp/x509up_u40208 -L https://fly1.grid.sara.nl:2882/pnfs/grid.sara.nl/data/bbmri.nl/RP3/README >README |
| 98 | }}} |
| 99 | |
| 100 | * You can also upload data to the Grid SRM from the BIOS VM using curl. To upload a local file `test.txt`, use the `--upload-file` (or `-T`) argument: |
| 101 | {{{ |
| 102 | mgalen@cloud-KVM:~$ curl --CApath /etc/grid-security/certificates/ -E /tmp/x509up_u40208 -L https://fly1.grid.sara.nl:2882/pnfs/grid.sara.nl/data/bbmri.nl/RP3/test.txt --upload-file test.txt |
| 103 | }}} |
| 104 | (In practice, of course use a more appropriate directory on the Grid SRM instead of the project root.) |
| 105 | Instead of specifying the full target name including filename, you can also just specify the target directory ending in a `/`. Curl will than use your local filename also on the Grid SRM: |
| 106 | {{{ |
| 107 | mgalen@cloud-KVM:~$ curl --CApath /etc/grid-security/certificates/ -E /tmp/x509up_u40208 -L https://fly1.grid.sara.nl:2882/pnfs/grid.sara.nl/data/bbmri.nl/RP3/ --upload-file test.txt |
| 108 | }}} |
| 109 | |
| 110 | * If you want to delete a file from the Grid SRM, use `-X DELETE` (use this with caution): |
| 111 | {{{ |
| 112 | mgalen@cloud-KVM:~$ curl --CApath /etc/grid-security/certificates/ -E /tmp/x509up_u40208 -L https://fly1.grid.sara.nl:2882/pnfs/grid.sara.nl/data/bbmri.nl/RP3/test.txt -X DELETE |
| 113 | }}} |
| 114 | |
| 115 | * Just checking if a file exists, without really downloading it, can be done with the `-I` option (A response with `200 OK` means the file exists, `404 Not Found` means it doesn't): |
| 116 | {{{ |
| 117 | mgalen@cloud-KVM:~$ curl --CApath /etc/grid-security/certificates/ -E /tmp/x509up_u40208 -L https://fly1.grid.sara.nl:2882/pnfs/grid.sara.nl/data/bbmri.nl/RP3/README |
| 118 | HTTP/1.1 200 OK |
| 119 | Date: Tue, 28 Jan 2014 15:26:17 GMT |
| 120 | ETag: 0000EADAD57CE41D47F5A8F069A7C24F8003_-1773128220 |
| 121 | Last-Modified: Wed, 15 Jan 2014 14:31:15 GMT |
| 122 | Content-Length: 154 |
| 123 | Server: Jetty(7.3.1.v20110307) |
| 124 | }}} |
| 125 | {{{ |
| 126 | mgalen@cloud-KVM:~$ curl --CApath /etc/grid-security/certificates/ -E /tmp/x509up_u40208 -L https://fly1.grid.sara.nl:2882/pnfs/grid.sara.nl/data/bbmri.nl/RP3/nonexisting |
| 127 | HTTP/1.1 404 Not Found |
| 128 | Content-Type: text/html |
| 129 | Transfer-Encoding: chunked |
| 130 | Server: Jetty(7.3.1.v20110307) |
| 131 | }}} |
| 132 | |
| 133 | |
| 134 | == Data Storage == |
| 135 | |
| 136 | Read [wiki:FgStorageInTheCloud Storage in the cloud] |