Generate targets pipeline code for GWAS meta-analysis
Source:R/generate_gwas_meta_pipeline.R
generate_gwas_meta_pipeline.RdGenerates a complete targets / tarchetypes pipeline as a character
string, ready to be pasted into a _targets.R file. The generated code
covers per-cohort preparation, LDSC heritability estimation, per-ancestry
and all-populations IVW meta-analysis, loci extraction, Manhattan plots
(with PDF and high-DPI PNG export targets for fast report rendering),
and MR-MEGA MDS. Validates trait_type, n_col, and manifest_df before
generating any code. The manifest is serialized as a tibble::tribble()
call in the generated output so the pipeline is self-contained.
Usage
generate_gwas_meta_pipeline(
trait,
trait_type,
n_col,
manifest_df,
hm3_path,
dbsnp_path,
crew_controller = NULL,
output_base_dir = "Data",
manhattan_width = 16,
manhattan_height = 6,
manhattan_dpi = 300,
output_file = NULL
)Arguments
- trait
Trait name used in target names (e.g.
"CAD").- trait_type
"binary"or"quantitative"(required, no default).- n_col
Sample size column:
"EffectiveN"for binary traits,"N"for quantitative traits (required, no default).- manifest_df
A data frame describing the cohort files to include. Required columns:
path(full path to raw summary statistics file),file(basename of the file),cohort(cohort label),ancestry(ancestry label, e.g."EUR"),study(unique human-readable identifier; must contain no spaces or special characters; does not need to include the ancestry). Atar_namecolumn ({study}_{ancestry}) is automatically added and used as thetar_maptarget name so that ancestry-based target selectors work reliably. Additional columns are preserved in the serialized output.Optionally, the manifest may include
col_*columns to map non-standard column names in each cohort's raw file to harmonized names. These are passed tobuild_column_map()during per-cohort preparation. Supported columns:col_chr,col_pos,col_rsid,col_effect_allele,col_other_allele,col_beta,col_se,col_p,col_eaf,col_n,col_n_cases,col_n_controls. Seebuild_column_map()for details.- hm3_path
Path to the LDSC HapMap3 SNP list file (e.g.
w_hm3.snplist). Required.- dbsnp_path
Path to the dbSNP155 reference directory passed to
prep_gwas(). Required.- crew_controller
Name of the
crewcontroller to use for resource-intensive targets (meta_ALLandmeta_common_variants_ALL). IfNULL(default), noresourcesblock is added to those targets.- output_base_dir
Base output directory for intermediate files. Default
"Data".- manhattan_width
Width in inches for Manhattan plot export. Default
16.- manhattan_height
Height in inches for Manhattan plot export. Default
6.- manhattan_dpi
DPI for high-resolution Manhattan plot PNG export. Default
300.- output_file
Optional file path to write the generated code to. When provided, parent directories are created automatically, the code is wrapped in a
```{targets <trait>-pipeline}chunk, and the raw pipeline code string is returned invisibly. WhenNULL(default), the string is returned visibly.
Examples
if (FALSE) { # \dontrun{
manifest <- data.frame(
path = c("/data/UKBB_EUR.txt.gz", "/data/MVP_AFR.txt.gz"),
file = c("UKBB_EUR.txt.gz", "MVP_AFR.txt.gz"),
cohort = c("UKBB", "MVP"),
ancestry = c("EUR", "AFR"),
study = c("UKBB_EUR", "MVP_AFR"),
# Optional: map non-standard column names per cohort
col_eaf = c("MY_FREQ", NA),
col_beta = c("BETA_VAL", NA),
stringsAsFactors = FALSE
)
cat(generate_gwas_meta_pipeline("CAD", trait_type = "binary",
n_col = "EffectiveN", manifest_df = manifest,
hm3_path = "/path/to/w_hm3.snplist",
dbsnp_path = "/path/to/dbSNP155"))
} # }