Social decision-making often involves evaluating not only the accuracy or performance of others but also their similarity to oneself. Understanding how individuals balance these competing factors—performance and similarity—is central to theories of social learning, influence, and preference formation. Although previous study (Zhang & Gläscher, 2020) buildt powerful models, however, the preference in their model is merely dependent on the sequence of participants to reveal information. In this study, we investigate social preference by modeling individual differences in how participants value informational sources that vary in both objective performance and perceived similarity. Using computational modeling, we fit participant-level parameters to quantify the relative weight assigned to each dimension, revealing the cognitive strategies that underlie social preferences.
We used data from Zhang, L. & Gläscher, J. (2020). You can find the original data here.
For the simplicity of the model, our model is based on the m2b model from the original study. More specifically, we assume that the participants will update both the chosen value and the unchosen value. Meanwhile, the choice of coplayer will affect the choice switching based on the preference of the participant. The model is defined as follows:
Choice 1 was accounted for by the option values of option A and option B
\[ \mathbb{V}_{t} =[V_{t}(A),V_{t}(B)] \]
where \(\mathbb{V}_{t}\) indicated a two-element vector consisting of option values of A and B on trial t. Values were then converted into action probabilities using a Softmax function. On trial \(t\), the action probability of choosing option A (between A and B) was defined as follows
\[ P_{t}(A)=\frac{1}{1+e^{-(V_{t}(A)-V_{t}(B))}}. \]
For Choice 2, we modeled it as a “switch” (coded as 1) or a “stay” (coded as 0) using a logistic regression. On trial \(t\), the probability of switching given the switch value was defined as follows
\[ P_{t}(\text{switch} )=\Phi (V_{t}(\text{switch} )) \]
where \(\Phi\) was the inverse logit linking function
\[ \Phi (x)=\frac{1}{1+e^{-x}} \]
Following the original study, we didn’t include the inverse Softmax temperature parameter \(\tau\). For the learning process, we have
\[ \begin{gathered}\delta_{\text{chosen,C2} ,t} =R_{t}-V_{\text{chosen,C2} ,t}\\ \delta_{\text{unchosen,C2} ,t} =-R_{t}-V_{\text{unchosen,C2} ,t}\\ V_{\text{chosen,C2} ,t+1}=V_{\text{chosen,C2}}+\alpha \delta_{\text{chosen,C2} ,t}\\ V_{\text{unchosen,C2} ,t+1}=V_{\text{unchosen,C2} ,t+1}+\alpha \delta_{\text{unchosen,C2} ,t}\end{gathered}. \] The reward prediction error is denoted as \(\delta\), and \(R_{t}\) is the reward on trial \(t\). The learning rate \(\alpha\) is a free parameter.
The instantaneous social influence on Choice 2 is also modeled as
\[ \begin{gathered}V_{t}(\text{switch} )=\beta_{\text{bias}_{\text{C2}}} +\beta_{\text{vdiff}_{\text{C2}}} (V_{\text{chosen,C1} ,t}-V_{\text{unchosen,C1} ,t})\\ +\beta_{\text{against}} w.\text{N}_{\text{against} ,t}\end{gathered} \]
where \(w.\text{N}_{\text{against} ,t}\) denoted the preference-weighted amount of dissenting social information relative to each participant’s Choice 1 on trial \(t\). The \(\beta_{\text{bias}_{\text{C2}}}\) represent the bias to switch in second choice, where \(\beta_{\text{vdiff}_{\text{C2}}}\) and \(\beta_{\text{against}}\) are the free parameters that determine the influence of value difference and social information on Choice 2, respectively. For the social influence, we have
\[ w.\text{N}_{\text{against,} t} =\frac{\sum_{s=1}^{K} w_{s,t}}{\sum_{s=1}^{4} w_{s,t}} ,K=0,1,...,4 \]
where \(s\in \{ 1,2,3,4\}\) indicated the coplayer \(s\), \(K\) indicated the number of opposite choices from the others and \(w_{s,t}\) was participants’ trial-by-trial preference weight toward the other four coplayers.
In contrast to the original study, we calculate the consistency and accuracy of each group members considering previous 3 trials. Therefore, we have
\[ \begin{gathered}\text{acc}_{s,t} =\frac{1}{3} \sum_{\tau =t-3}^{t-1} \text{win}_{s,\tau} ,t>3\\ \text{con}_{s,t} =\frac{1}{3} \sum_{\tau =t-3}^{t-1} \text{sam}_{s,\tau} ,t>3\\ \text{win}_{s,t} =\begin{cases}1&\text{if} \ R_{s,t}>0\\ 0&\text{if} \ R_{s,t}<0\end{cases}\\ \text{sam}_{s,t} =\begin{cases}1&\text{if} \ \text{C2}_{s,t} =\text{C2}_{t}\\ 0&\text{if} \ \text{C2}_{s,t} \neq \text{C2}_{t}\end{cases}\end{gathered} \]
where \(\text{C2}_{s,t}\) indicated the choice of coplayer \(s\) on trial \(t\), and \(\text{C2}_{t}\) indicated the choice of participant on trial \(t\). For the first three trials, we set
\[ \begin{gathered}\text{acc}_{s,1} =0\\ \text{con}_{s,1} =0\\ \text{acc}_{s,t} =\frac{1}{t-1} \sum_{\tau =1}^{t-1} \text{win}_{s,\tau} ,0<t<4\\ \text{con}_{s,t} =\frac{1}{t-1} \sum_{\tau =1}^{t-1} \text{sam}_{s,\tau} ,0<t<4\end{gathered} \]
Now, we define the preference weight \(w_{s,t}\) as follows
\[ w_{s,t}=\omega \text{acc}_{s,t} +(1-\omega )\text{con}_{s,t} ,\omega \in [0,1] \]
where \(\omega\) is the weight parameter that determines the relative importance of accuracy and consistency in the participant’s preference.
There isn’t any revision on the bet model. However, the weights used in the bet model are same as the weights defined above.
Code and figures can be found here.