Convenience function to do all necessary preparations downloading names, nodes and accession2taxid data from NCBI and preprocessing into a SQLite database for downstream use.
Usage
prepareDatabase(
sqlFile = "nameNode.sqlite",
tmpDir = ".",
getAccessions = TRUE,
vocal = TRUE,
...
)Arguments
- sqlFile
character string giving the file location to store the SQLite database
- tmpDir
location for storing the downloaded files from NCBI. (Note that it may be useful to store these somewhere convenient to avoid redownloading)
- getAccessions
if TRUE download the very large accesssion2taxid files necessary to convert accessions to taxonomic IDs
- vocal
if TRUE output messages describing progress
- ...
Arguments passed on to
getNamesAndNodes,getAccession2taxid,read.accession2taxidurlthe url where taxdump.tar.gz is located
fileNamesthe filenames desired from the tar.gz file
protocolthe protocol to be used for downloading. Probably either
'http'or'ftp'. Overridden ifurlis provided directlyresumeif TRUE attempt to resume downloading an interrupted file without starting over from the beginning
baseUrlthe url of the directory where accession2taxid.gz files are located
typesthe types if accession2taxid.gz files desired where type is the prefix of xxx.accession2taxid.gz. The default is to download all nucl_ accessions. For protein accessions, try
types=c('prot').extraSqlCommandfor advanced use. A string giving a command to be called on the SQLite database before loading data. A couple potential uses:
"pragma temp_store = 2;" to keep all SQLite temp files in memory. Don't do this unless you have a lot (>100 Gb) of RAM
indexTaxaif TRUE add an index for taxa ID. This would only be necessary if you want to look up accessions by taxa ID e.g.
getAccessionsoverwriteIf TRUE, delete accessionTaxa table in database if present and regenerate