A high-level wrapper that runs the full per-cohort preparation pipeline:
(1) cleans the raw file via clean_gwas(), (2) validates trait-type columns
via assert_trait_columns(), (3) matches to HapMap3 SNPs via
snp_match_munge(), (4) estimates the LDSC intercept via
ldscr::ldsc_h2(), and (5) writes an .parquet file with LDSC-adjusted
(or unadjusted) SE, Z, and P columns.
Usage
prep_gwas(
sumstats_file,
hm3,
ancestry,
output_path,
trait_type,
n_col,
dbsnp_path = "Data/dbSNP155",
logging_path = "Data/logs",
column_map = NULL,
...
)Arguments
- sumstats_file
Path to a raw GWAS summary statistics file.
- hm3
Data frame of HapMap3 SNPs with columns
SNP,A1,A2.- ancestry
Ancestry label (e.g.
"EUR"). Passed toldscr::ldsc_h2().- output_path
Directory where the output parquet file is written.
- trait_type
"binary"or"quantitative"(required, no default).- n_col
Bare column name for sample size. Use
EffectiveNfor binary traits andNfor quantitative traits (required, no default).- dbsnp_path
Path to the dbSNP155 reference. Default
"Data/dbSNP155".- logging_path
Directory for tidyGWAS log files. Default
"Data/logs".- column_map
Optional named character vector of per-cohort column renames passed through to
clean_gwas()and ultimately toharmonize_sumstats_headers(). DefaultNULL.- ...
Additional arguments forwarded to
clean_gwas().