Posts

Showing posts from December, 2016

2016 wrap-up

Image
What a rollercoaster. On the good side, we've seen advances in sequencing methods including improvements in Nanopore sequencing and single cell methods becoming more common. We also saw 10x genomics come to the party with an emulsion PCR approach that can be applied to produce synthetic long reads or single cell barcoding. There were major results from larger cohort studies exemplified by ExAC, which is revealing more about human genetic variation, while Blueprint and GTEx are revealing more about determinants of gene regulation. The #openaccess is growing rapidly in the bioinformatics community, along with the growth of preprint popularity which I hope is adopted more widely in the biomedical sciences.

On the not so good side, Illumina has made made no new instrument announcements nor any substantial updates to existing sequencing systems. There were a few papers (example) describing the methylation EPIC array announced in 2015. Prices for Illumina reagents continue to increase,…

MSigDB gene sets for mouse

I recently needed to convert MSigDB gene sets to mouse so I thought I would share.

GO.v5.2.symbols_mouse.gmt
kegg.v5.2.symbols_mouse.gmt
msigdb.v5.2.symbols_mouse.gmt
reactome.v5.2.symbols_mouse.gmt

Below is the code used to do the conversion. It requires an input GMT file of human gene symbols as well as a human-mouse orthology file. You can download the ortholog file here. As the name suggests, it is based on data downloaded from Ensembl Biomart version 87.

Running the program converts all human gmt files. It requres gnu parallel which can be easily installed on Ubuntu with "sudo apt-get install parallel"


#!/bin/bash

conv(){
line=$1
  NAME_DESC=`echo $line | cut -d ' ' -f-2`

  GENES=`echo $line | cut -d ' ' -f3- \
  | tr ' ' '\n' | sort -uk 1b,1 \
  | join -1 1 -2 1 - \
  <(cut -f3,5 mouse2hum_biomart_ens87.txt \
  | sed 1d | awk '$1!="" && $2!=""' \
  | sort -uk 1b,1) | cut -d ' ' -f2 \
  | sort -u…