random - How to draw without replacement to fill in data set -


i generating data set first want randomly draw number each observation discrete distribution, fill in var1 these numbers. next, want draw number distribution each row, catch number in var1 observation not eligible drawn anymore. want repeat relatively large number of times.

to make make more sense, suppose start with:

id 1 2 3 ... 999 1000 

suppose distribution have ["a", "b", "c", "d", "e"] happen probability [.2, .3, .1, .15, .25].

i first randomly draw distribution fill in var. suppose result of is:

id    var1 1     e 2     e 3     c ...    999   b 1000  

now e not eligible drawn observations 1 , 2. c, b, , a ineligible observations 3, 999, , 1000, respectively.

after columns filled in, may end this:

id    var1  var2  var3  var4  var5 1     e     c     b         d 2     e         b     d     c 3     c     b         e     d ...         999   b     d     c         e 1000      e     b     c     d 

i not sure of how approach in stata. 1 way fill in var1 like:

gen random1 = runiform() replace var1 = "a" if random1<.2 replace var1 = "b" if random1>=.2 & random1<.5 etc.... 

note sticking (scaled) probabilities after creating var1 desirable, not required me.

here's solution works in long form select distribution. values selected, flagged done , next selection made groups contain remaining values. probabilities scaled @ each pass.

version 14 set seed 3241234  * example generated -dataex-. install: ssc install dataex clear input byte ip str1 y double p 1 "a"  .2 2 "b"  .3 3 "c"  .1 4 "d" .15 5 "e" .25 end  local nval = _n  * following should true isid y  expand 1000 bysort y: gen id = _n sort id ip  gen done = 0  forvalues = 1/`nval' {      // scale probabilities     bysort id done (ip): gen double ptot = sum(p)   // running sum     id done: gen double phigh = sum(p / ptot[_n])     id done: gen double plow = cond(_n == 1, 0, phigh[_n-1])      // random number in range of (0,1) group     bysort id done (ip): gen double x = runiform()      // pick not done group; choose first x represent group     id done: gen pick = !done & inrange(x[1], plow, phigh)      // put picked obs @ end , create new var     bysort id (pick ip): gen v`i' = y[_n]      // done obs picked     bysort id: replace done = 1 if _n == _n      drop x pick ptot phigh plow }  bysort id: keep if _n == 1 

Comments