if 'pvalue' is not in the genelist columns, it is set and defaulted to 1 for visualization purposes if 'effectsize' is not in the genelist columns, it is set and defaulted to 0 for visualization purposes

read_validate_genelist(
  file,
  remove_non_numerical_ids = TRUE,
  remove_duplicated = TRUE,
  remove_Rik_genes = TRUE,
  remove_Gm_genes = TRUE,
  map_organism = NULL
)

Arguments

file

full filepath to gene tibble in .csvs/.xlsx/.tsv

remove_non_numerical_ids

boolean, default TRUE, if non-numerical in gene column, remove

remove_duplicated

boolean, default TRUE, removes duplicated gene symbols/ids

remove_Rik_genes

boolean, default TRUE, grepl("Rik$") search and remove Riken non-canonical mouse genes

remove_Gm_genes

boolean, default TRUE, grepl("^Gm") search and remove Gm non-canonical mouse genes

map_organism

default: NULL, if numeric taxid, used for selecting org.Xx.eg.db to map gene symbols to gene column via AnnotationDbi::mapIds(keytype = 'ALIAS') - if mapped to NA the genes are removed - need to download org.Xx.eg.db manually! Symbols are set toupper() to match formatting. Protein symbols could be used too.

  • 9606 = Human (Homo sapiens) (org.Hs.eg.db)

  • 9544 = Rhesus monkey (Macaca mulatta) (org.Mmu.eg.db)

  • 10090 = Mouse (Mus musculus) (org.Mm.eg.db)

  • 10116 = Rat (Rattus norvegicus) (org.Rn.eg.db)

  • 7227 = Fruit fly (Drosophila melanogaster) (org.Dm.eg.db)

  • 6239 = Worm (Caenorhabditis elegans) (org.Ce.eg.db)

Value

tibble dataframe with columns: symbol (string), gene (string as integer ID), pvalue (numeric), effestsize (numeric)

Examples

file_path <- system.file("extdata", "example_genelist.csv", package = "goatea")
read_validate_genelist(file = file_path)
#> Checking file format...
#> # A tibble: 100 × 5
#>    symbol   gene     pvalue effectsize signif
#>    <chr>   <int>      <dbl>      <dbl> <lgl> 
#>  1 gene_81 18837 0.00000547     -4.3   TRUE  
#>  2 gene_77 19922 0.000261        2.4   TRUE  
#>  3 gene_17 13765 0.000842       -2.2   TRUE  
#>  4 gene_31 14822 0.00194         2.5   TRUE  
#>  5 gene_38 12392 0.00247         3.3   TRUE  
#>  6 gene_96 19338 0.00335        -3.3   TRUE  
#>  7 gene_70 17254 0.00358         4.2   TRUE  
#>  8 gene_55 12874 0.00360        -1.1   TRUE  
#>  9 gene_72 13860 0.00442        -1.1   TRUE  
#> 10 gene_82 19040 0.00593         0.300 FALSE 
#> # ℹ 90 more rows