Prepare and QC GWAS summary statistics — prep

A high-level wrapper that runs the full per-cohort preparation pipeline: (1) cleans the raw file via clean_gwas(), (2) validates trait-type columns via assert_trait_columns(), (3) matches to HapMap3 SNPs via snp_match_munge(), (4) estimates the LDSC intercept via ldscr::ldsc_h2(), and (5) writes an .parquet file with LDSC-adjusted (or unadjusted) SE, Z, and P columns.

Usage

prep_gwas(
  sumstats_file,
  hm3,
  ancestry,
  output_path,
  trait_type,
  n_col,
  dbsnp_path = "Data/dbSNP155",
  logging_path = "Data/logs",
  column_map = NULL,
  ...
)

Arguments

sumstats_file: Path to a raw GWAS summary statistics file.
hm3: Data frame of HapMap3 SNPs with columns SNP, A1, A2.
ancestry: Ancestry label (e.g. "EUR"). Passed to ldscr::ldsc_h2().
output_path: Directory where the output parquet file is written.
trait_type: "binary" or "quantitative" (required, no default).
n_col: Bare column name for sample size. Use EffectiveN for binary traits and N for quantitative traits (required, no default).
dbsnp_path: Path to the dbSNP155 reference. Default "Data/dbSNP155".
logging_path: Directory for tidyGWAS log files. Default "Data/logs".
column_map: Optional named character vector of per-cohort column renames passed through to clean_gwas() and ultimately to harmonize_sumstats_headers(). Default NULL.
...: Additional arguments forwarded to clean_gwas().

Value

The file path of the written parquet file (invisibly).

Details

Prepare GWAS summary statistics for meta-analysis

Examples

if (FALSE) { # \dontrun{
out <- prep_gwas(
  sumstats_file = "cohort1.txt.gz",
  hm3           = hm3_snps,
  ancestry      = "EUR",
  output_path   = "Data/prepped",
  trait_type    = "binary",
  n_col         = EffectiveN
)
} # }