
Assigning Respondents to Clusters/Segments in New Data Files in Q

Once you have created segments or clusters, you may find it useful to assign people in other data sets to the segments (a process also known as segment tagging and scoring). For example, you may want to tag a customer database with predicted segment memberships. Alternatively, you may want to assign respondents in a tracker to segments. When doing this, there are two basic approaches:
- You can assign people to segments in the new data file using the same variables as used when forming the segments, or,
- You can predict segment membership based on a different set of variables.
Before proceeding with any of these approaches, for risk management purposes, remember to take a copy of your project and make your changes in the copy.
The basic principle underlying all of these approaches involves creating a model in one data set, and then importing a revised data set. You need to ensure that the model does not update to reflect the new data. Then, you use the existing model to make predictions in the new data set with the new variables as inputs.
Assigning people to segments in the new data file using the same variables
The best way to do this depends on whether we have used latent class analysis or k-means cluster analysis.
Segments formed using latent class analysis
A three-segment latent class solution, based on a sample size of 400, is shown below. To allocate people in a new data file using these segments:
- File > Data Sets > Update and select the data file used to create the analysis.
- Choose the new data file and press OK.
- Click on the latent class output, which will be shown as having an error and press Ignore.
- The variable in the project that shows segment membership has now automatically updated, allocating people in the new data file to the segments.
Segments formed using k-means
A three-cluster k-means solution is shown above. To allocate people in a new data file using these segments:
- Click on the k-means output and make sure that Automatic is not checked (this option is in Inputs > R Code in the Object Inspector, on the right of the screen).
- Take a copy of line 2 of the code. In my example, it looks like this:
kmeans = KMeans(data.frame(understand, shop, key, value, interested),
- Go to the Variables and Questions tab.
- Right-click on the the first variable and select Insert Variable(s) > R Variable
- Paste in the copied code, and modify it so that it looks like the following code. The key bits to retain from your pasted code are kmeans or whatever name you assigned to it, and the variable names.
predict(kmeans, newdata = data.frame(understand, shop, key, value, interested))
- Press the play button (the blue triangle).
- Insert the Question name
- Press Add R Variable
- Change the Question Type of the newly-created variable to Pick One.
- Press the Values button and enter any labels you desire and press OK.
Predict segment membership using a different set of variables
In this scenario, segments have been formed and then a predictive model is used to predict segment membership on either:
- A completely different set of variables (e.g., demographics, or some other data available in a customer database).
- A subset of the variables used to create the segments. (Tip: if you are building a predictive model based on exactly the same variables as used to create segments, you are making a mistake, and should instead use the approach described in the previous section).
The output above from a multinomial logit (MNL) model (In Q5: Create > Regression > Multinomial Logit), predicting segment membership based on firmographics. The goal is to now predict segment membership in a new data file, that contains the same predictor variables.
- Click on the model output and make sure that Automatic is not checked (this option is in Inputs > R Code in the Object Inspector).
- Take a copy of the line of code that looks similar to this (with different variable names):
glm = Regression(segmentsGXVYS ~ q1 + q2 + q3 + q4 + q5,
- Go to the Variables and Questions tab.
- Right-click on the the first variable and select Insert Variable(s) > R Variable
- Paste in the copied code, and modify it so that it looks like the following code. The key bits to retain from your pasted code are glm or whatever it has been changed to and the variable names.
predict(glm, newdata = data.frame(q1, q2, q3, q4, q5))
- Press the play button (the blue triangle).
- Insert the Question name
- Press Add R Variable
- Change the Question Type of the newly-created variable to Pick One.
- Press the Values button and enter any labels you desire and press OK.