Title: | Measure Memory and CPU Usage for Parallel R Code |
---|---|
Description: | Measures memory and CPU usage of R code by regularly taking snapshots of calls to the system command 'ps'. The package provides an entry point (albeit coarse) to profile usage of system resources by R code run in parallel. |
Authors: | Simon Couch [aut, cre] , Posit Software, PBC [cph, fnd] |
Maintainer: | Simon Couch <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.1.9000 |
Built: | 2024-11-15 05:21:49 UTC |
Source: | https://github.com/simonpcouch/syrup |
This function is a wrapper around the system command ps
that can
be used to benchmark (peak) memory and CPU usage of parallel R code.
By taking snapshots the memory usage of R processes at a regular interval
,
the function dynamically builds up a profile of their usage of system
resources.
syrup(expr, interval = 0.5, peak = FALSE, env = caller_env())
syrup(expr, interval = 0.5, peak = FALSE, env = caller_env())
expr |
An expression. |
interval |
The interval at which to take snapshots of respirce usage. In practice, there's an overhead on top of each of these intervals. |
peak |
Whether to return rows for only the "peak" memory usage.
Interpreted as the |
env |
The environment to evaluate |
While much of the verbiage in the package assumes that the supplied
expression will be distributed across CPU cores, there's nothing specific
about this package that necessitates the expression provided to syrup()
is
run in parallel. Said another way, syrup()
will work just fine
with "normal," sequentially-run R code (as in the examples). That said,
there are many better, more fine-grained tools for the job in the case of
sequential R code, such as Rprofmem()
, the
profmem
package, the bench package, and packages in
the R-prof GitHub organization.
Loosely, the function works by:
Setting up another R process (call it sesh
) that queries system
information using ps::ps()
at a regular interval,
Evaluating the supplied expression,
Reading the queried system information back into the main process from sesh
,
Closing sesh
, and then
Returning the queried system information.
Note that information on the R process sesh
is filtered out from the results
automatically.
A tibble with columns id
and time
and a number of columns from
ps::ps()
output describing memory and CPU usage. Notably, the process ID
pid
, parent process ID ppid
, percent CPU usage, and resident set size
rss
(a measure of memory usage).
# pass any expression to syrup. first, sequentially: res_syrup <- syrup({res_output <- Sys.sleep(1)}) res_syrup # to snapshot memory and CPU information more (or less) often, set `interval` syrup(Sys.sleep(1), interval = .01) # use `peak = TRUE` to return only the snapshot with # the highest memory usage (as `sum(rss)`) syrup(Sys.sleep(1), interval = .01, peak = TRUE) # results from syrup are more---or maybe only---useful when # computations are evaluated in parallel. see package README # for an example.
# pass any expression to syrup. first, sequentially: res_syrup <- syrup({res_output <- Sys.sleep(1)}) res_syrup # to snapshot memory and CPU information more (or less) often, set `interval` syrup(Sys.sleep(1), interval = .01) # use `peak = TRUE` to return only the snapshot with # the highest memory usage (as `sum(rss)`) syrup(Sys.sleep(1), interval = .01, peak = TRUE) # results from syrup are more---or maybe only---useful when # computations are evaluated in parallel. see package README # for an example.