Skip to contents

A high-level wrapper that runs the full per-cohort preparation pipeline: (1) cleans the raw file via clean_gwas(), (2) validates trait-type columns via assert_trait_columns(), (3) matches to HapMap3 SNPs via snp_match_munge(), (4) estimates the LDSC intercept via ldscr::ldsc_h2(), and (5) writes an .parquet file with LDSC-adjusted (or unadjusted) SE, Z, and P columns.

Usage

prep_gwas(
  sumstats_file,
  hm3,
  ancestry,
  output_path,
  trait_type,
  n_col,
  dbsnp_path = "Data/dbSNP155",
  logging_path = "Data/logs",
  column_map = NULL,
  ...
)

Arguments

sumstats_file

Path to a raw GWAS summary statistics file.

hm3

Data frame of HapMap3 SNPs with columns SNP, A1, A2.

ancestry

Ancestry label (e.g. "EUR"). Passed to ldscr::ldsc_h2().

output_path

Directory where the output parquet file is written.

trait_type

"binary" or "quantitative" (required, no default).

n_col

Bare column name for sample size. Use EffectiveN for binary traits and N for quantitative traits (required, no default).

dbsnp_path

Path to the dbSNP155 reference. Default "Data/dbSNP155".

logging_path

Directory for tidyGWAS log files. Default "Data/logs".

column_map

Optional named character vector of per-cohort column renames passed through to clean_gwas() and ultimately to harmonize_sumstats_headers(). Default NULL.

...

Additional arguments forwarded to clean_gwas().

Value

The file path of the written parquet file (invisibly).

Details

Prepare GWAS summary statistics for meta-analysis

Examples

if (FALSE) { # \dontrun{
out <- prep_gwas(
  sumstats_file = "cohort1.txt.gz",
  hm3           = hm3_snps,
  ancestry      = "EUR",
  output_path   = "Data/prepped",
  trait_type    = "binary",
  n_col         = EffectiveN
)
} # }