Skip to content Skip to sidebar Skip to footer

Generate Random Variables From A Probability Distribution

I have extracted some variables from my python data set and I want to generate a larger data set from the distributions I have. The problem is that I am trying to introduce some va

Solution 1:

It sounds like you want to generate data based on the PDF described in the second table. The PDF is something like

0forx <= B
A*exp(-A*(x-B)) forx > B

A defines the width of your distribution, which will always be normalized to have an area of 1. B is the horizontal offset, which is zero in your case. You can make it an integer distribution by binning with ceil.

The CDF of a normalized decaying exponential is 1 - exp(-A*(x-B)). Generally, a simple way to make a custom distribution is to generate uniform numbers and map them through the CDF.

Fortunately, you won't have to do that, since scipy.stats.expon already provides the implementation you are looking for. All you have to do is fit to the data in your last column to get A (B is clearly zero). You can easily do this with curve_fit. Keep in mind that A maps to 1.0/scale in scipy PDF language.

Here is some sample code. I've added an extra layer of complexity here by computing the integral of the objective function from n-1 to n for integer inputs, taking the binning into account for you when doing the fit.

import numpy as np
from scipy.optimize import curve_fit
from scipy.stats import expon

def model(x, a):
    return np.exp(-a * (x - 1)) - exp(-a * x)
    #Alternnative:
    # return -np.diff(np.exp(-a * np.concatenate(([x[0] - 1], x))))

x = np.arange(1, 16)
p = np.array([0.8815, 0.0755, ..., 0.0010, 0.0005])
a = curve_fit(model, x, p, 0.01)
samples = np.ceil(expon.rvs(scale=1/a, size=2000)).astype(int)
samples[samples == 0] = 1
data = np.bincount(samples)[1:]

Solution 2:

If you have an exponential decay, the underlying discrete probability distribution is a geometric distribution. (It's the discrete counterpart of the continuous exponential distribution.) Such a geometric distribution uses a parameter p with the probability of success of one trial (like a biased coin toss). The distribution describes the number of trials needed to get one success.

The expected mean of the distribution is 1/p. So, we can calculate the mean of the observations to estimate p.

The function forms part of scipy as scipy.stats.geom. To sample the distribution, use geom.rvs(estimated_p, size=2000).

Here is some code to demonstrate the approach:

from scipy.stats import geom
import matplotlib.pyplot as plt
import numpy as np
from collections import defaultdict

observation_index = [1, 2, 3, 4, 7, 13]
observation_count = [352, 28, 8, 4, 4, 4]

observed_mean = sum([i * c for i, c inzip(observation_index, observation_count)]) / sum(observation_count)

estimated_p = 1 / observed_mean
print('observed_mean:', observed_mean)
print('estimated p:', estimated_p)

generated_values = geom.rvs(estimated_p, size=2000)
generated_dict = defaultdict(int)
for v in generated_values:
    generated_dict[v] += 1
generated_index = sorted(list (generated_dict.keys()))
generated_count = [generated_dict [i] for i in  generated_index]
print(generated_index)
print(generated_count)

Output:

observed_mean:1.32estimated p:0.7575757575757576new random sample:
    [1, 2, 3, 4, 5, 7]
    [1516, 365, 86, 26, 6, 1]

Post a Comment for "Generate Random Variables From A Probability Distribution"