A convenience function to read in a large file piece by piece, process it (hopefully reducing the size either by summarizing or removing extra rows or columns) and return the output
Usage
streamingRead(
bigFile,
n = 1e+06,
FUN = function(xx) sub(",.*", "", xx),
...,
vocal = FALSE
)
Arguments
- bigFile
a string giving the path to a file to be read in or a connection opened with "r" mode
- n
number of lines to read per chunk
- FUN
a function taking the unparsed lines from a chunk of the bigfile as a single argument and returning the desired output
- ...
any additional arguments to FUN
- vocal
if TRUE cat a "." as each chunk is processed
Examples
tmpFile<-tempfile()
writeLines(LETTERS,tmpFile)
streamingRead(tmpFile,10,head,1)
#> [[1]]
#> [1] "A"
#>
#> [[2]]
#> [1] "K"
#>
#> [[3]]
#> [1] "U"
#>
writeLines(letters,tmpFile)
streamingRead(tmpFile,2,paste,collapse='',vocal=TRUE)
#> .............
#> [[1]]
#> [1] "ab"
#>
#> [[2]]
#> [1] "cd"
#>
#> [[3]]
#> [1] "ef"
#>
#> [[4]]
#> [1] "gh"
#>
#> [[5]]
#> [1] "ij"
#>
#> [[6]]
#> [1] "kl"
#>
#> [[7]]
#> [1] "mn"
#>
#> [[8]]
#> [1] "op"
#>
#> [[9]]
#> [1] "qr"
#>
#> [[10]]
#> [1] "st"
#>
#> [[11]]
#> [1] "uv"
#>
#> [[12]]
#> [1] "wx"
#>
#> [[13]]
#> [1] "yz"
#>
unlist(streamingRead(tmpFile,2,sample,1))
#> [1] "a" "d" "f" "g" "i" "l" "m" "p" "q" "s" "u" "x" "z"