
How to Split Text Strings in Q

String splitting is the process of breaking up a text string in a systematic way, so that the individual parts of the text can be processed. Survey data can include text of various kinds which may require some processing before it can be useful to your analysis. Examples include timestamps, or variables where multiple values are stored separated by commas. While it is usually better to ask your data provider to give you data in as good a format as possible, sometimes it can be necessary to process data of this kind yourself. This article will outline the procedure for splitting text strings in Q, using both JavaScript and R.
Example
In this post we consider a simple example, where people’s responses to an awareness question on soft drinks have been store in a single variable, with commas separating the 1st mention, 2nd mention, and so on. The raw data looks like this:
Respondent #16 has three responses to the awareness question. Splitting that response by a comma produces three separate bits of information, which can then be stored separately or processed.
Splitting Text Strings in Q with JavaScript
A new variable can be added to your existing data set using a JavaScript formula variable. To add a variable like this, use the following steps:
- Select Create > Variables and Questions > Variable(s) > JavaScript Formula
- Choose Text if you want the output to be another text variable, or Numeric if you want the output variable to have numeric values.
- Enter your Expression and click OK.
The most important part is the Expression. This is the JavaScript code that will split your text and return a new value. Using the default settings, this expression runs once for each row (respondent) in the data file and should be written to return a single new value. Below, we use a simple example to see how it works.
The most straightforward way to split strings with JavaScript is to use the string.split function. The syntax for this is:
string.split(separator)
Here, string is replaced with the variable name for your string input and separator tells the code which character(s) to look for when splitting the text. This function returns a JavaScript array containing the separate strings.
To create a new variable which has the 1st mention for each respondent we would use the following expression:
var string = awareness;
var split_string = string.split(“,”);
split_string[0];
The Expression and Preview of results show us how the formula is working:
The first line finds the variable named awareness in the data set and assigns the value to a JavaScript variable called string. The second line splits the string according to commas, creating an array. The final line returns the first element in the array (JavaScript arrays start at zero rather than one).
Splitting Text Strings with R
To split data in R, the strsplit() function is used. The syntax for this is:
strsplit(x, split)
where x represents the vector or string you are looking to split and split denotes the character or expression you want to use as the separator. Running this function will produce the split strings as a list.
To generate the R output in Q, we must generate a new R variable. This can be done by following the Create > Variables and Questions > Variable(s) > R Variable path from the menu bar or right clicking in the Variables and Questions tab and choosing Insert Variable(s) > R Variable.
Now in the Edit R Variable window, give your new variable a name in the Question Name field at the bottom of the window and build your R code in the R CODE section.
To build R code that will split the text into strings, you must first point to the variable whose text you want to split and insert your splitting parameters.
Splitdata <- strsplit(TextVar1, ” “)
This produces an object in R called a list. Q cannot interpret lists as variables. The list must be converted into a data frame or matrix.
When the text strings contain an identical number of segments (for example, timestamps in the form hh:mm:ss always have 3 segments), you can use the rbind function to arrange them into columns all of them at once:
do.call(rbind, Splitdata)
When the text strings do not contain an identical number of segments, as in our example above, it takes a bit more work to organize the data as a set of columns. The following text works out the maximum number of elements present among the text strings and applies that length to all text strings.
x <- strsplit(Splitdata, ” “)
#get max length
n = max(sapply(x, length))
for (j in 1:length(x))
length(x [[j]]) <- n
z = do.call(rbind, x)
z
This output shows our data split and organized into columns, but inserts “NA” in the fields where no data is present. We can clean this up by adding one additional line of code in the second-last line:
x <- strsplit(awareness, “,”)
#get max length
n = max(sapply(x, length))
for (j in 1:length(x))
length(x [[j]]) <- n
z = do.call(rbind, x)
z[is.na(z)] <- “” # Replace NAs with blanks
z
Press the Play button to verify your output. Once verified, click the Add R variable button to complete the process.
Finally, the labels of the new variables can be generated by using, the colnames() function to name the columns of the data frame:
x <- strsplit(awareness, “,”)
#get max length
n = max(sapply(x, length))
for (j in 1:length(x))
length(x [[j]]) <- n
z = do.call(rbind, x)
z[is.na(z)] <- “” # Replace NAs with blanks
colnames(z) = paste0(“Mention: “,1:ncol(z))
z
We hope you found this article helpful! To discover how you can do more in Q, head on over to our blog!