gwastargets: GWAS Meta-Analysis Pipeline Functions • gwastargets

gwastargets provides a suite of functions for building reproducible multi-cohort GWAS meta-analysis pipelines using the targets framework. It supports both binary (case/control) and quantitative traits and handles the full workflow from raw summary statistics through meta-analysis and downstream quality-control.

Installation

# install.packages("pak")
pak::pkg_install("mglev1n/gwastargets")

Overview

The package is organized around two complementary use cases:

1. Generating a targets pipeline — given a cohort manifest and trait metadata, produce the complete _targets.R code for a reproducible analysis:

library(gwastargets)

manifest <- data.frame(
  path     = c("/data/UKBB_EUR.txt.gz", "/data/MVP_AFR.txt.gz", "/data/BBJ_EAS.txt.gz"),
  file     = c("UKBB_EUR.txt.gz",       "MVP_AFR.txt.gz",       "BBJ_EAS.txt.gz"),
  cohort   = c("UKBB",                  "MVP",                  "BBJ"),
  ancestry = c("EUR",                   "AFR",                  "EAS"),
  study    = c("UKBB_EUR",              "MVP_AFR",              "BBJ_EAS"),
  stringsAsFactors = FALSE
)

code <- generate_gwas_meta_pipeline(
  trait       = "CAD",
  trait_type  = "binary",
  n_col       = "EffectiveN",
  manifest_df = manifest
)

cat(code)

2. Using the pipeline functions directly — each stage is available as a standalone exported function for interactive use or integration into existing workflows:

Function	Purpose
`prep_gwas()`	Clean, harmonize, and LDSC-correct one cohort’s summary statistics
`summarize_sumstats()`	Per-cohort QC summary (N, MAF range, median SE, precision)
`meta_analyze_ivw()`	Fixed-effects IVW meta-analysis across cohorts
`extract_loci()`	Clump significant variants into independent loci with gene annotation
`extract_common_variants()`	Identify variants present across all studies (for MR-MEGA)
`calculate_mr_mega_mds()`	Compute MR-MEGA MDS coordinates from cross-cohort effect estimates
`snp_match_munge()`	Match summary statistics to a reference panel and munge for LDSC
`harmonize_sumstats_headers()`	Standardize column names across GWAS file formats

Pipeline Architecture

Raw summary stats (per cohort)
        │
        ▼
   prep_gwas()          ← clean, match HM3, LDSC intercept correction
        │
        ├──── ldsc_h2()            per-cohort heritability (via ldscr)
        │
        ├──── summarize_sumstats() per-cohort QC table
        │
        ├──── meta_analyze_ivw()   per-ancestry IVW meta-analysis
        │         └── extract_loci()
        │
        └──── meta_analyze_ivw()   all-populations IVW meta-analysis
                  ├── extract_loci()
                  └── extract_common_variants()
                            └── calculate_mr_mega_mds()

Manifest requirements

generate_gwas_meta_pipeline() validates the manifest before generating any code. The required columns are:

Column	Description
`path`	Full path to the raw summary statistics file
`file`	Basename of the file (used to join back to per-cohort summaries)
`cohort`	Human-readable cohort label (e.g. `"UKBB"`)
`ancestry`	Ancestry label matching ldscr reference panels (e.g. `"EUR"`, `"AFR"`, `"EAS"`)
`study`	Unique identifier used as the `tar_map` target name — no spaces or special characters

Additional columns are preserved in the serialized output and can be used downstream.

ldscr — LDSC heritability and genetic correlation in R
tidyGWAS — GWAS summary statistics cleaning
gwasRtools — GWAS loci extraction and annotation
targets — reproducible pipeline toolkit

gwastargets

Installation

Overview

Pipeline Architecture

Manifest requirements

Related packages