R/read_validate_genelist.R
read_validate_genelist.Rdif 'pvalue' is not in the genelist columns, it is set and defaulted to 1 for visualization purposes if 'effectsize' is not in the genelist columns, it is set and defaulted to 0 for visualization purposes
read_validate_genelist(
file,
remove_non_numerical_ids = TRUE,
remove_duplicated = TRUE,
remove_Rik_genes = TRUE,
remove_Gm_genes = TRUE,
map_organism = NULL
)full filepath to gene tibble in .csvs/.xlsx/.tsv
boolean, default TRUE, if non-numerical in gene column, remove
boolean, default TRUE, removes duplicated gene symbols/ids
boolean, default TRUE, grepl("Rik$") search and remove Riken non-canonical mouse genes
boolean, default TRUE, grepl("^Gm") search and remove Gm non-canonical mouse genes
default: NULL, if numeric taxid, used for selecting org.Xx.eg.db to map gene symbols to gene column via AnnotationDbi::mapIds(keytype = 'ALIAS') - if mapped to NA the genes are removed - need to download org.Xx.eg.db manually! Symbols are set toupper() to match formatting. Protein symbols could be used too.
9606 = Human (Homo sapiens) (org.Hs.eg.db)
9544 = Rhesus monkey (Macaca mulatta) (org.Mmu.eg.db)
10090 = Mouse (Mus musculus) (org.Mm.eg.db)
10116 = Rat (Rattus norvegicus) (org.Rn.eg.db)
7227 = Fruit fly (Drosophila melanogaster) (org.Dm.eg.db)
6239 = Worm (Caenorhabditis elegans) (org.Ce.eg.db)
tibble dataframe with columns: symbol (string), gene (string as integer ID), pvalue (numeric), effestsize (numeric)
file_path <- system.file("extdata", "example_genelist.csv", package = "goatea")
read_validate_genelist(file = file_path)
#> Checking file format...
#> # A tibble: 100 × 5
#> symbol gene pvalue effectsize signif
#> <chr> <int> <dbl> <dbl> <lgl>
#> 1 gene_45 11023 0.123 5 FALSE
#> 2 gene_12 12763 0.435 5 FALSE
#> 3 gene_34 16847 0.435 5 FALSE
#> 4 gene_83 12069 0.0148 -4.9 TRUE
#> 5 gene_14 17454 0.0151 4.8 TRUE
#> 6 gene_96 12308 0.375 4.8 FALSE
#> 7 gene_57 19013 0.667 -4.8 FALSE
#> 8 gene_92 11532 0.676 4.7 FALSE
#> 9 gene_31 13749 0.00597 -4.6 TRUE
#> 10 gene_59 10264 0.0410 -4.6 TRUE
#> # ℹ 90 more rows