Friday, November 8, 2013
The machine analogy clarifies how we need predictors that are manipulated (as in an experiment) or at least measured independently of each other and the output. Unfortunately, we seldom get either conceptually or statistical independent variables from our customer satisfaction questionnaires. In my last post, I reviewed a considerable amount of evidence that customer performance ratings often are confounded with each other and with measures of overall satisfaction. In fact, all the ratings of both the "input" and the "output" appear to be manifestations of a single underlying affect dimension. No one is arguing that the delivery of features and services do not impact satisfaction, only that customer ratings do not provide the data needed to estimate those effects. I am not the first to reach such a conclusion.
Several examples of the input-output model are provided in a Marketing Science paper by Buschken, Otter and Allenby (the last author is associated with the R package bayesm). In each of these examples there is an overall or global evaluation accompanied by a list of what they call partial evaluations. In one study the dependent variable was "this hospital would be my first choice" and then a list of independent variables asking about perceptions of the hospital as a safe place, conveniently located, having skilled doctors, and so on. Everything was rated on a nine-point agreement scale.
The authors note that these product and service perceptions were intended to be conceptually independent. That is, the perceptions can be correlated because they reflect real co-variation as hospitals seek to provide a consistent level of service, but it is still reasonable to estimate the effect of any one driver controlling for the other drivers, (e.g., having skilled doctors controlling for convenient location and safe place). In fact, the key driver analyst is trying to include all the important determinants without being redundant. Consequently, one expects that the set of input variables would have dimension close to the number of drivers because each predictor is measuring something unique. One would not expect to find multicollinearity among the drivers or endogeneity between the input and the output as if all the ratings "effectively ask about total satisfaction" (p. 2).
Nevertheless, the Marketing Science article does find evidence for a single dimension underlying the inputs. In response, the paper suggests a Bayesian mixture model to separate those respondents who are "able and motivated to provide separate evaluations for each component" from those who "may fail to think clearly about the component evaluations" (p.3). Although there are likely to be individual differences, the discussion section of the paper points to the type of survey item as the likely cause of low-dimensional predictors. They conclude that regression-based driver analysis is inadequate "when survey items are not specific enough to engage respondents beyond an overall sentiment" (p.30). As support for this conclusion, the paper distinguishes between the one-dimensional general items from the Florida vacation data set (e.g., fun, exciting, interesting, and enjoyable) and the multidimensional concrete items from the Smartphone data set (e.g., weight, toggle or navigation wheel, and battery stamina).
Trying to Save the Input-Output Model or Moving On to Network Analysis
The Buschken, Otter and Allenby paper can be seen as an attempt to salvage the input-output model by separating respondents into clusters that did and did not provide conceptually independent ratings. Their limited success, however, was restricted to a rating scale that asked about specific details and required respondents to recall actual usage experiences. Yet, the problem of endogeneity still remains with the global evaluation working backwards and "coloring" the performance ratings for the components. More importantly, such "fixes" are not models of the associative processes accounting for rating data.
The good news is the R offers several ways of displaying and representing associative processes in network structures. In fact, my first post introduced such a network visualization of key driver analysis. As shown below, the nodes are the rating items and the lines represent correlations. Higher correlations between the ratings result in thicker green lines. All the lines are green because all the correlations are positive. The nodes were given different color to suggest item groupings. Thus, the "output" ratings of overall satisfaction, likelihood to recommend and willingness to fly again are all shown with the same purple color. In addition, the "inputs" have been given different colors to show three groupings: the red booking process, the green aircraft features, and the aqua customer service.
I would argue that such a network is a more accurate representation than a regression equation. All the ratings have equal status. There are no inputs and outputs. For example, an announcement at the end of the flight reminding passengers about the frequent flyer miles they earned and inviting them back for their next flight may have an impact on some respondent's willingness to fly again. Unlike a regression model, the network suggests that drawing attention to one's frequent flyer miles might alter their memories and their ratings of the most highly associated items. Just thinking about a free trip or upgrade might improve my perceptions of the seat comfort and roominess and remember the service as better than it was. If this is a halo effect, it is real and exploited by the airlines.
Borsboom and Cramer have taken this approach further in their network analysis of psychopathology. The symptoms of a disorder are causally interconnected with worry causing insomnia which causes fatigue and then feeds back to increase worry. Their paper includes R code for the package that I used to produce the above figure (qgraph) and a graphical modeling package for causal inference (pcalg).
Model Selection Makes a Difference
Much of our thinking seems to be dominated by causal modeling, with or without latent variables (e.g., structural equations). Consequently, we tend not to see the feedback loops and associations among all our perceptions and ratings. Continuing with our airline example, I can minimize confounding by asking about very specific events (e.g., "Were you asked if you wanted something to drink?"). However, more general questions (e.g., "Were the flight attendants helpful?") can be answered by remembering our impression without having to recall every interaction with the flight attendants. In such cases we tend to find high correlations among all the ratings as if we were measuring a single underlying affect dimension. Regression analysis is simply not appropriate or informative under these conditions.
The network model, on the other hand, encourages us to treat all the ratings as associations without needing to separate them into inputs and outputs. For example, nudge theory suggests that small changes can have big impacts. Not only can I increase your retention likelihood by offering you a discount on your next purchase, I might also be able to increase your satisfaction with your current purchase. In marketing research we are often ask to report the one thing that will have the greatest impact. That one thing is to stop thinking that there is one leverage point with maximum return when there are multiple leverage points that need our attention.