Generate a function to calculate a wasserstein distance
Source:R/calculate-wasserstein.R
wasserstein_calculator.RdArguments
- obs
A vector of observations
- debias
Should the simulations be shifted to match the mean of the observed data
- bootstraps
Randomly resample from the simulated data points to match the observed size this many times and combine the output by averaging. The alternative, when this is 1 (the default) matches the sizes by selecting and/or repeating the simulated data points in order (deterministically)
Value
a function that takes parameter sim and returns a
length normalised wasserstein distance. This is the average distance an
individual data point must be shifted to match the reference data normalised
by the average distance of the reference data from the mean. The function
also takes a p parameter which can be a progressr progress bar which must
be named.
Details
This function takes reference data in the forms of individual observations in
the form of for example event times and returns a crated function to
calculate the wasserstein distance from simulated data to the observed data.
For large simulations this will be quicker than calculate_wasserstein but
must be used with a crated scorer_fn
In the comparison unequal lengths of the data can be accommodated. The simulated data is recycled, and sampled, until the same length as the observed data before the comparison.
Examples
# example case counts from an exponential growth process
sim = rexpgrowth(1000, 0.05, 40, 0)
obs = rexpgrowth(1000, 0.075, 40, 0)
obs2 = rexpgrowth(1000, 0.05, 40, 0)
calc = wasserstein_calculator(sim)
# obs is a different distribution to sim (larger growth)
calc(obs)
#> [1] 0.3260278
# obs2 is from the same distribution as sim so the RMSE should be lower:
calc(obs2)
#> [1] 0.09023579