There are occasions when you have collected more records than necessary for a survey and you want to randomly remove the surplus or you simply want to select a random subset of records to do something with. This blog will show you how to select a random sample of respondents in your data set based on a variable. This could either include the whole set or filtered to a specific group.

You have completed fieldwork for an important survey and you have gone over quota for males but you don’t want to just delete the last records. Instead you want to randomly select 30 male records to remove. You have opened the data set in Q but don’t know how to proceed. Don’t worry! The solution is to use some JavaScript to generate the random allocation.

Create a filter

First, we need to create a filter for males. Bring up a table for Gender in the Outputs tab, right-click the percentage field for the ‘Male’ row and click Create filter.  The Question name for filter should default to the selected category (in this case “Male”), but you may want to alter it as needed. You may also wish to deselect Apply to the current table before pressing OK. This has now created a variable at the top of the Variables and Questions tab. In the Name column, change the randomly allocated name to Males to make it easier to remember for later.

Use a JavaScript formula

Now we want to add the code to randomly select respondents based on this filter. Right-click on any row in the Variables and Questions tab and select Insert Variable(s) > JavaScript Formula > Numeric.

You will be presented with an Expression box to paste JavaScript code into. Paste in the below code:

var _filter = Males; // Enter the variable name of the filter variable here
var _sample_number = 30; // Enter the number of people to be kept

var _index = [];

for (var i = 0; i < N; i++)
 _index.push(i); 

var _rand = _index.map(function (_x) {
 return {_ind: _x, _val: Math.random() * _filter[_x] }
}); 

_rand = _rand.sort(function (a, b) { return b._val - a._val; }); 

var _max_vals = _rand.map(function (_x) { return _x._ind; }); 

_max_vals = _max_vals.slice(0, _sample_number); 

_results = _index.map(function (x) { return _max_vals.indexOf(x) > -1 });

_results

In this code:

  1. We specify the name of the filter variable and the random sample size
  2. We create an array of indices for the whole data set called _index
  3. We create a second array called _rand using object names that maps the indices to a random number which will return 0 if not included in the filter variable
  4. We sort _rand and return the indices so that we can select the 30 records using the slice function
  5. Finally, we map these 30 records to _results, returning a 1 for the selected records and 0 for the rest

Once the code has been pasted in, click Access all data rows (advanced) on the right and in the top-left of this dialogue box, assign an appropriate Name and Label, then press OK.

Hard-code the random selection

Remember that this code is dynamic and that it continually re-evaluates unless we fix the output. To do this, ensure your unique ID variable is selected in the Case IDs drop-down at the top of your Data tab. Now return to the Variables and Questions tab, highlight the random sample variable and select Copy and Paste Variable(s) > As Values to hard-code the random selection so that it doesn’t re-calculate in the future. You can then change the Variable Type to Categorical and click the F in the Tags column for that row to turn this into a filter.

Now that we have this filter, we can remove these respondents by applying the filter to the Data tab.  Then, right-click any row and select Delete Rows Matching Filter (Green).