Take NCBI accession2taxid files, keep only accession and taxa and save it as a SQLite database
Usage
read.accession2taxid(
taxaFiles,
sqlFile,
vocal = TRUE,
extraSqlCommand = "",
indexTaxa = FALSE,
overwrite = FALSE
)
Arguments
- taxaFiles
a string or vector of strings giving the path(s) to files to be read in
- sqlFile
a string giving the path where the output SQLite file should be saved
- vocal
if TRUE output status messages
- extraSqlCommand
for advanced use. A string giving a command to be called on the SQLite database before loading data. A couple potential uses:
"pragma temp_store = 2;" to keep all SQLite temp files in memory. Don't do this unless you have a lot (>100 Gb) of RAM
- indexTaxa
if TRUE add an index for taxa ID. This would only be necessary if you want to look up accessions by taxa ID e.g.
getAccessions
- overwrite
If TRUE, delete accessionTaxa table in database if present and regenerate
Examples
taxa<-c(
"accession\taccession.version\ttaxid\tgi",
"Z17427\tZ17427.1\t3702\t16569",
"Z17428\tZ17428.1\t3702\t16570",
"Z17429\tZ17429.1\t3702\t16571",
"Z17430\tZ17430.1\t3702\t16572"
)
inFile<-tempfile()
sqlFile<-tempfile()
writeLines(taxa,inFile)
read.accession2taxid(inFile,sqlFile,vocal=FALSE)
db<-RSQLite::dbConnect(RSQLite::SQLite(),dbname=sqlFile)
RSQLite::dbGetQuery(db,'SELECT * FROM accessionTaxa')
#> base accession taxa
#> 1 Z17427 Z17427.1 3702
#> 2 Z17428 Z17428.1 3702
#> 3 Z17429 Z17429.1 3702
#> 4 Z17430 Z17430.1 3702
RSQLite::dbDisconnect(db)