Useful Tools/Functions

Useful Tools

METAL

METAL is an efficient tool for performing meta-analysis of genome-wide association studies.

regenie

regenie is a useful program for efficiently performing both common- and rare-variant genetic association studies using biobank-scale data.

SAIGE/SAIGE-GENE+

SAIGE is a computationally efficient tool for performing biobank-scale genetic association studies, using generalized mixed models to account for relatedness/population stratification. SAIGE-GENE+ can be used for performing rare variant association studies.

Useful R Packages

arrow

The arrow package is a cross-language platform for reading/writing data. This package can read/write from several formats including parquet, feather, csv, and tsv, as well as cloud-storage like Amazon S3. Parquet in particular is a highly efficient columnar data format. arrow enables users to rapidly query massive datasets (potentially spread across multiple files) that are larger-than-memory.

data.table

The data.table package is a high-performance package for reading and manipulating data.frames. The syntax can be confusing at first, but enables efficient manipulation of large dataframes.

dtplyr

The dtplyr package allows users to use data.table as a backend for dplyr. This combination of tools allows the user to specify dataframe manipulations using the convenient dplyr syntax, while taking advantage of the efficiency of data.table.

GenomicSEM

The GenomicSEM package allows users to perform several tasks, including:

  • Multivariate LD-score regression
  • Create structural equation models using GWAS summary statistics
  • Conduct multi-trait/multivariate GWAS

GWASinspector

The GWASinspector package is useful for performing quality control of genome-wide association studies before performing meta-analysis. The package takes a set of input GWAS and returns a set of cleaned/harmonized files ready for meta-analysis.

ieugwasr

The ieugwasr package allows users to query the MRC-IEU OpenGWAS Project.

locusplotr

The locusplotr package allows users to query the University of Michigan LocusZoom API to create regional association plots from GWAS summary statistics.

patchwork

The patchwork package allows users to combine multiple ggplots into a single graphic. This is useful for generating figures for manuscripts.

tidyverse

The tidyverse is a suite of packages that are generally useful for data science, and includes packages for:

  • dplyr - Manipulating data
  • ggplot2 - Plotting data
  • tidyr - Tidying data
  • readr - Read rectangular data (.csv, .tsv, etc.)

TwoSampleMR

The TwoSampleMR package from the MRC-IEU provides tools for performing Mendelian randomization, including:

  • Querying the MRC-IEU OpenGWAS Project
  • Performing two-sample Mendelian Randomziation
  • Performing MR sensitivity analyses
  • Clumping variants based on linkage disequilibrium and p-value

Useful R Functions

liftover_gwas()

The liftover_gwas() function from the GwasDataImport package is a wrapper for the rtracklayer::liftOver() function. liftover_gwas() can be used to convert genomic positions from an arbitrary dataframe containing columns for chromosome and position.

phewas()

The phewas() function from the ieugwasr can be used to perform a phenome-wide association study for a genetic variant of interest across studies cataloged in the MRC-IEU OpenGWAS Project.