Skip to contents

Take NCBI accession2taxid files, keep only accession and taxa and save it as a SQLite database

Usage

read.accession2taxid(
  taxaFiles,
  sqlFile,
  vocal = TRUE,
  extraSqlCommand = "",
  indexTaxa = FALSE,
  overwrite = FALSE
)

Arguments

taxaFiles

a string or vector of strings giving the path(s) to files to be read in

sqlFile

a string giving the path where the output SQLite file should be saved

vocal

if TRUE output status messages

extraSqlCommand

for advanced use. A string giving a command to be called on the SQLite database before loading data. A couple potential uses:

  • "pragma temp_store = 2;" to keep all SQLite temp files in memory. Don't do this unless you have a lot (>100 Gb) of RAM

indexTaxa

if TRUE add an index for taxa ID. This would only be necessary if you want to look up accessions by taxa ID e.g. getAccessions

overwrite

If TRUE, delete accessionTaxa table in database if present and regenerate

Value

TRUE if sucessful

Examples

taxa<-c(
  "accession\taccession.version\ttaxid\tgi",
  "Z17427\tZ17427.1\t3702\t16569",
  "Z17428\tZ17428.1\t3702\t16570",
  "Z17429\tZ17429.1\t3702\t16571",
  "Z17430\tZ17430.1\t3702\t16572"
)
inFile<-tempfile()
sqlFile<-tempfile()
writeLines(taxa,inFile)
read.accession2taxid(inFile,sqlFile,vocal=FALSE)
db<-RSQLite::dbConnect(RSQLite::SQLite(),dbname=sqlFile)
RSQLite::dbGetQuery(db,'SELECT * FROM accessionTaxa')
#>     base accession taxa
#> 1 Z17427  Z17427.1 3702
#> 2 Z17428  Z17428.1 3702
#> 3 Z17429  Z17429.1 3702
#> 4 Z17430  Z17430.1 3702
RSQLite::dbDisconnect(db)