if 'pvalue' is not in the genelist columns, it is set and defaulted to 1 for visualization purposes if 'effectsize' is not in the genelist columns, it is set and defaulted to 0 for visualization purposes

read_validate_genelist(
  file,
  remove_non_numerical_ids = TRUE,
  remove_duplicated = TRUE,
  remove_Rik_genes = TRUE,
  remove_Gm_genes = TRUE,
  map_organism = NULL
)

Arguments

file

full filepath to gene tibble in .csvs/.xlsx/.tsv

remove_non_numerical_ids

boolean, default TRUE, if non-numerical in gene column, remove

remove_duplicated

boolean, default TRUE, removes duplicated gene symbols/ids

remove_Rik_genes

boolean, default TRUE, grepl("Rik$") search and remove Riken non-canonical mouse genes

remove_Gm_genes

boolean, default TRUE, grepl("^Gm") search and remove Gm non-canonical mouse genes

map_organism

default: NULL, if numeric taxid, used for selecting org.Xx.eg.db to map gene symbols to gene column via AnnotationDbi::mapIds(keytype = 'ALIAS') - if mapped to NA the genes are removed - need to download org.Xx.eg.db manually! Symbols are set toupper() to match formatting. Protein symbols could be used too.

  • 9606 = Human (Homo sapiens) (org.Hs.eg.db)

  • 9544 = Rhesus monkey (Macaca mulatta) (org.Mmu.eg.db)

  • 10090 = Mouse (Mus musculus) (org.Mm.eg.db)

  • 10116 = Rat (Rattus norvegicus) (org.Rn.eg.db)

  • 7227 = Fruit fly (Drosophila melanogaster) (org.Dm.eg.db)

  • 6239 = Worm (Caenorhabditis elegans) (org.Ce.eg.db)

Value

tibble dataframe with columns: symbol (string), gene (string as integer ID), pvalue (numeric), effestsize (numeric)

Examples

file_path <- system.file("extdata", "example_genelist.csv", package = "goatea")
read_validate_genelist(file = file_path)
#> Checking file format...
#> # A tibble: 100 × 5
#>    symbol   gene  pvalue effectsize signif
#>    <chr>   <int>   <dbl>      <dbl> <lgl> 
#>  1 gene_45 11023 0.123          5   FALSE 
#>  2 gene_12 12763 0.435          5   FALSE 
#>  3 gene_34 16847 0.435          5   FALSE 
#>  4 gene_83 12069 0.0148        -4.9 TRUE  
#>  5 gene_14 17454 0.0151         4.8 TRUE  
#>  6 gene_96 12308 0.375          4.8 FALSE 
#>  7 gene_57 19013 0.667         -4.8 FALSE 
#>  8 gene_92 11532 0.676          4.7 FALSE 
#>  9 gene_31 13749 0.00597       -4.6 TRUE  
#> 10 gene_59 10264 0.0410        -4.6 TRUE  
#> # ℹ 90 more rows