Models | Ciencia Cognitiva & Economía

Whole number bias

Wed, 28 Apr 2021 08:51:25 -0500

(see English translation below)

Este paper tiene al final el código bayesiano en PyMC3 para estimar el intrinsic whole-number bias (Alonso-Díaz, Piantadosi, Hayden, & Cantlon, 2018)

Es usual encontrar en toma de decisiones y en ambientes académicos que la gente tiende a usar el numerador cuando compara proporciones (e.g. probabilidades en el trabajo, fraccionarios en clase, indices financieros). Es un problema presente en la mayoría, incluso en matemáticos de alto nivel.

Hay muchas ilusiones cognitivas automáticas (impenetrables cognitivamente). En esta entrada resumiré dos ideas que acercan el sesgo a numeradores altos en una categoría de ilusión, más que un error en calcular fracciones: 1) la gente tiene la ilusión que opciones con numeradores altos tienen proporciones mayores, y 2) esto ocurre aún si la persona computa el valor de las proporciones (el sesgo es intrinseco/automático)

Les preguntamos a varios participantes que escogieran repétidamente entre dos loterías, una a la izquierda y otra a la derecha. Cada lotería se mostró con imagenes con bolas azules (ganadoras) y naranjas (perdedoras). Tenían que escoger una lotería/bolsa, con la condición de que si sacaban una bola azul ganaban (todo era hipotético). Esto durante varios turnos. En algunos turnos, la bolsa con la mejor proporción tenía más bolas ganadoras (condición congruente) y en otros menos (condición incongruente).

Nuestros participantes, como en literatura previa, mostraron una fuerte preferencias por loterías con mayor número de ganadoras (azules).

Para determinar sí este efecto comportamental es una ilusión intrinseca, es decir, que ocurre aún cuando se computa el ratio, ajustamos el siguiente modelo (ver diagrama abajo). Los cuadros grises son datos y los circulos blancos son parámetros latentes/no-observables. $L$ y $W$ es el número de perdedoras (Losers) y ganadoras (Winners) que el participante ve en cada bolsa. $Choice$ es la decisión que tomó el participante en el turno i. $\Phi$ denota la percepción de la numerosidad de las bolas (Dehaene, 2007). La decisión es estocástica y sigue una probabilidad softmax ($pSM$).

En el modelo la decisión es una combinación líneal de la percepción del número de ganadoras (azules), perdedoras, y la proporción de azules (ver formula para $f_{ir}$ en el diagrama). El posterior inferido del peso para la proporción de azules ($\beta_{ratios}$) es mayores a cero. Los participantes usaban la proporción de azules en su decisión (gráfica de abajo). El modelo hace un buen trabajo en replicar las decisiones promedios (gráfico más abajo; línea punteada datos humanos; línea sólida promedio modelo i.e. del posterior predictive; verde turnos congruentes; rojo turnos incongruentes).

El modelo permitió inferir que el sesgo por numeradores altos no es una simple falla en percibir/calcular valores fraccionales: los pesos de decisión no son cero para las proporciones de azules (más detalles en Alonso-Díaz, et al, 2018). El whole-number bias es más cercano a una ilusión cognitiva. Aún cuando sabemos que hay algo mal, es inevitable caer. Por eso es observable en todos los niveles educativos e incluso en matemáticos, estadísticos, financistas, y científicos experimentados.

English (by Google Translate with some edits)

This paper has at the end the Bayesian code in PyMC3 to estimate the intrinsic whole-number bias (Alonso-Díaz, Piantadosi, Hayden, & Cantlon, 2018)

It is common to find in decision-making and academic settings that people tend to use the numerator when comparing proportions (e.g. probabilities at work, fractions in class, financial indices). It is a problem present in most, even high-level mathematicians.

There are many cognitive illusions (cognitively impenetrable). In this post I will summarize two ideas that bring the bias towards high numerators closer to being an illusion, rather than an error in calculating fractions: 1) people are under the illusion that options with high numerators have larger proportions, and 2) this occurs even if the person computes the value of the proportions (bias is intrinsic / automatic)

We asked several participants to repeatedly choose between two lotteries, one on the left and one on the right. Each lottery was shown with images with blue (winners) and orange (losers) balls. They had to choose a lottery / bag, on the condition that if they drew a blue ball they won (it was all hypothetical). This for several trials. In some trials, the bag with the best proportion had more winning balls (congruent condition) and in others fewer (incongruous condition).

Our participants, as in previous literature, showed a strong preference for lotteries with a higher number of winners (blue).

To determine if this behavioral effect is an intrinsic illusion, that is, that it occurs even when the ratio is computed, we fitted the following model (see diagram below). The gray boxes are data and the white circles are latent / unobservable parameters. $L$ and $W$ is the number of losers and winners that the participant sees in each bag. $Choice$ is the decision made by the participant in trial i. $\Phi$ denotes the perception of the numbers of the balls (Dehaene, 2007). The decision is stochastic and follows a softmax probability ($pSM$).

In the model the decision is a linear combination of the perception of the number of winners (blues), losers, and the proportion of blues (see formula for $ f_ {ir} $ in the diagram). The inferred posterior weight for the ratio of blue balls ($ \beta_{ratios} $) is greater than zero. Participants used the proportion of blues in their decision (graph below). The model does a good job of replicating average decisions (graph further below; dotted line human data; average solid line model i.e. from the posterior predictive; green congruent trials; red incongruous trials).

The model allowed us to infer that the bias towards high numerators is not a simple failure to perceive / calculate fractional values: the decision weights are not zero for the proportions of blue (more details in Alonso-Díaz, et al, 2018). The whole-number bias is closer to a cognitive illusion. Even when we know there is something wrong, it is inevitable to fall. That is why it is observable at all educational levels and even in mathematicians, statisticians, financiers, and experienced scientists.

Python

El material, los datos necesarios, y la implementación en PyMC3 también se pueden encontrar en este github. Es el notebook 8_Numerosity.ipynb. La data esta en la carpeta data/8_CB/WNB.csv

The material, the necessary data, and the implementation in PyMC3 can also be found in this github. It’s the 8_Numerosity.ipynb notebook. The data is in the folder data/8_CB/WNB.csv

#Load data
#Performance: 0 wrong, 1 correct
#RT: response time in secs
#ProbRatio: small ratio / large ratio
#NumRatio: small numerator / large numerator
#DenRatio: small denominator / large denominator
#AreaCtl: dots across bags have 1: equal dot size, 2: equal cumulative area
#WinSide1: number of winners left bag
#WinSide2: number of winners right bag
#DenSide1: total balls left bag
#DenSide2: total balls right bag
#ProbSide1: probability of win left bag
#ProbSide2: probability of win right bag
#sideR: side of response; 1 left, 2 right, 0 no response.
#subID: subject identifier
WNB_all = pd.read_csv('data/8_CB/WNB.csv')
WNB_all['ProbDistance'] = np.abs(WNB_all['ProbSide1']-WNB_all['ProbSide2'])
WNB_all = WNB_all.loc[WNB_all['sideR']>0,:].reset_index(drop=True)
idx1 = WNB_all['ProbSide1']>=WNB_all['ProbSide2']
WNB_all['WinSmallRatio'] = int(0)
WNB_all['DenSmallRatio'] = int(0)
WNB_all['WinBigRatio'] = int(0)
WNB_all['DenBigRatio'] = int(0)
for i in range(WNB_all.shape[0]):
if idx1[i]:
WNB_all.loc[i,'WinSmallRatio'] = WNB_all.loc[i,'WinSide2']
WNB_all.loc[i,'DenSmallRatio'] = WNB_all.loc[i,'DenSide2']
WNB_all.loc[i,'WinBigRatio'] = WNB_all.loc[i,'WinSide1']
WNB_all.loc[i,'DenBigRatio'] = WNB_all.loc[i,'DenSide1']
else:
WNB_all.loc[i,'WinSmallRatio'] = WNB_all.loc[i,'WinSide1']
WNB_all.loc[i,'DenSmallRatio'] = WNB_all.loc[i,'DenSide1']
WNB_all.loc[i,'WinBigRatio'] = WNB_all.loc[i,'WinSide2']
WNB_all.loc[i,'DenBigRatio'] = WNB_all.loc[i,'DenSide2']
sID = WNB_all['subID'].unique()
subj_to_model = -1 #0 to 20; -1 for all
WNB = WNB_all
if subj_to_model>=0:
WNB = WNB_all.loc[WNB_all['subID']==sID[subj_to_model],:].reset_index(drop=True)
weber = 0.286679553540291 #mean value of participants (see paper)
winners_s = np.sort(WNB['WinSmallRatio'].unique())
winners_b = np.sort(WNB['WinBigRatio'].unique())
winners = np.sort(pd.concat([pd.Series(winners_s), pd.Series(winners_b)]).unique())
losers_s = np.sort((WNB['DenSmallRatio']-WNB['WinSmallRatio']).unique())
losers_b = np.sort((WNB['DenBigRatio']-WNB['WinBigRatio']).unique())
losers = np.sort(pd.concat([pd.Series(losers_s), pd.Series(losers_b)]).unique())
sn = np.array(WNB['WinSmallRatio'], dtype = str)
sd = np.array(WNB['DenSmallRatio'], dtype = str)
r = []
for idx, ele in enumerate(sn):
r.append(ele + "_" + sd[idx])
bn = np.array(WNB['WinBigRatio'], dtype = str)
bd = np.array(WNB['DenBigRatio'], dtype = str)
for idx, ele in enumerate(bn):
r.append(ele + "_" + bd[idx])
r = pd.Series(r).unique()
ratios = np.zeros((r.shape[0],3))
for idx, ele in enumerate(r):
temp = np.array(ele.split("_"), dtype = int)
ratios[idx,0] = temp[0] #num
ratios[idx,1] = temp[1] #den
ratios[idx,2] = temp[0]/temp[1] #ratio
print(winners.shape, losers.shape, ratios.shape)
#Indices (for vectors with unique values)
side1 = np.zeros((WNB.shape[0],3)) #column order: index for winners, losers, ratios
side2 = np.zeros((WNB.shape[0],3))
for i in range(WNB.shape[0]):
#side 1
w = WNB.loc[i, 'WinSmallRatio']
den = WNB.loc[i, 'DenSmallRatio']
l = den - w
side1[i,0] = np.where(winners == w)[0][0]
side1[i,1] = np.where(losers == l)[0][0]
side1[i,2] = np.where((ratios[:,0] == w) & (ratios[:,1] == den))[0][0]
#side 2
w = WNB.loc[i, 'WinBigRatio']
den = WNB.loc[i, 'DenBigRatio']
l = den - w
side2[i,0] = np.where(winners == w)[0][0]
side2[i,1] = np.where(losers == l)[0][0]
side2[i,2] = np.where((ratios[:,0] == w) & (ratios[:,1] == den))[0][0]
side1 = side1.astype(int)
side2 = side2.astype(int)
#choice data
idx1 = WNB['ProbSide1']>=WNB_all['ProbSide2']
idx2 = WNB['sideR'] == 1
WNB['correct'] = np.array((idx1 & idx2) | (~idx1 & ~idx2), dtype = int)
choice = WNB['correct'] #0: incorrect; 1: correct
#WNB

#Model
with pm.Model() as WNB_model:
#priors
#percepts of winners and losers assumed different e.g. due to lose aversion
Winners = pm.Normal('percept_winners',
mu = winners, sd = weber*winners, shape = winners.shape)
Losers = pm.Normal('percept_losers',
mu = losers, sd = weber*losers, shape = losers.shape)
Ratios = pm.Beta('percept_ratios',
alpha = ratios[:,0] + 1,
beta = ratios[:,1] - ratios[:,0] + 1, shape = ratios.shape[0])
Weight_win = pm.Uniform('weight_win', lower = -5, upper = 5)
Weight_lose = pm.Uniform('weight_lose', lower = -5, upper = 5)
Weight_ratio = pm.Uniform('weight_ratio', lower = 0, upper = 5)
#likelihood
f_side1 = Weight_ratio*Ratios[side1[:,2]] + Weight_win*Winners[side1[:,0]] + Weight_lose*Losers[side1[:,1]]
f_side2 = Weight_ratio*Ratios[side2[:,2]] + Weight_win*Winners[side2[:,0]] + Weight_lose*Losers[side2[:,1]]
softmax = tt.exp(f_side2)/(tt.exp(f_side1) + tt.exp(f_side2)) #prob. of picking side 2
choice_LH = pm.Bernoulli('choice', p = softmax, observed = choice)
#sampling
trace = pm.sample(1000, init = 'adapt_diag', tune=1500)
ppc = pm.sample_posterior_predictive(trace, samples=5000)
data = az.from_pymc3(trace=trace)

#Weights for numbers and ratios.
#They are not zero, both explain human behavior.
#Even computing ratios, we still use raw numerosity information
fig, ax = plt.subplots(1,2, figsize = [15,6])
az.plot_density(
[trace['weight_win'], trace['weight_lose']],
data_labels=["$winners$",
"$losers$"],
shade=.1, ax = ax[0], hdi_prob=.95,
)
az.plot_density(
[trace['weight_ratio']], hdi_prob=.95,
data_labels=["$\\beta_{ratio}$"], outline=True,
shade=.25, ax = ax[1], colors = 'purple',
)
ax[0].set_title('$\\beta$', fontsize = 20)
ax[1].set_title('$\\beta_{ratios}$', fontsize = 20)
ax[0].legend(loc='upper right');

#Posterior predictive (dotted humans; solid model)
idx_cong = WNB['WinBigRatio']>WNB['WinSmallRatio'] #Congruent trial
idx_incong = ~idx_cong #Incongruent trial
ppc_cong = pd.concat([pd.DataFrame(ppc['choice'].mean(axis=0)[idx_cong], columns = ['choice_model']),
WNB.loc[idx_cong,:].reset_index(drop=True)], axis = 1)
ppc_incong = pd.concat([pd.DataFrame(ppc['choice'].mean(axis=0)[idx_incong], columns = ['choice_model']),
WNB.loc[idx_incong,:].reset_index(drop=True)], axis = 1)
toplot_cong = ppc_cong.groupby(['ProbDistance']).mean()[['choice_model','correct']].reset_index()
toplot_incong = ppc_incong.groupby(['ProbDistance']).mean()[['choice_model','correct']].reset_index()
idx1 = toplot_cong['ProbDistance']==0
idx2 = toplot_incong['ProbDistance']==0
mean0 = (toplot_cong.loc[idx1,'correct'] + toplot_incong.loc[idx2,'correct'])/2
toplot_cong.loc[idx1,'correct'] = mean0 #In prob. distance 0 congruent, incongruente doesn't apply
toplot_incong.loc[idx2,'correct'] = mean0
mean0 = (toplot_cong.loc[idx1,'choice_model'] + toplot_incong.loc[idx2,'choice_model'])/2
toplot_cong.loc[idx1,'choice_model'] = mean0 #In prob. distance 0 congruent, incongruente doesn't apply
toplot_incong.loc[idx2,'choice_model'] = mean0
fig = plt.figure(figsize=[9,7])
plt.plot(toplot_cong['ProbDistance'], toplot_cong['correct'], color = 'forestgreen', linestyle = ':')
plt.scatter(toplot_cong['ProbDistance'], toplot_cong['correct'], color = 'forestgreen', linestyle = ':')
plt.plot(toplot_incong['ProbDistance'], toplot_incong['correct'], color = 'red', linestyle = ':')
plt.scatter(toplot_incong['ProbDistance'], toplot_incong['correct'], color = 'red', linestyle = ':')
plt.plot(toplot_cong['ProbDistance'], toplot_cong['choice_model'], color = 'forestgreen')
plt.scatter(toplot_cong['ProbDistance'], toplot_cong['choice_model'], color = 'forestgreen')
plt.plot(toplot_incong['ProbDistance'], toplot_incong['choice_model'], color = 'red')
plt.scatter(toplot_incong['ProbDistance'], toplot_incong['choice_model'], color = 'red');
plt.ylim([0.25,1])
plt.xlabel('Prob. distance between the bags', fontsize = 20)
plt.ylabel('Accuracy\n%correct ', fontsize = 20);

Referencias:

Alonso-Díaz, S., Piantadosi, S. T., Hayden, B. Y., & Cantlon, J. F. (2018). Intrinsic whole number bias in humans. Journal of Experimental Psychology: Human Perception and Performance, 44(9), 1472.

Dehaene, S. (2007). Symbols and quantities in parietal cortex: Elements of a mathematical theory of number representation and manipulation. Sensorimotor foundations of higher cognition, 22, 527-574.

Tallying Heuristic (PyMC3)

Sun, 04 Apr 2021 08:51:25 -0500

(see English translation below)

Esta entrada tiene al final el código en PyMC3 para la estimación Bayesiana de los parámetros de la heurística tallying. Especificamente, los pesos de los cues y la fuerza del prior penalizando el uso de toda la información (adaptado de la version en R de Parpart, et al, 2018). A continuación una explicación suscinta de la heurística y un resultado interesante de Parpart, et al, 2018.

Un teoría influyente sobre toma de decisiones es que los humanos usamos heurísticas, es decir, atajos para escoger entre opciones. Una heurística famosa es tallying. Parpart, et al, 2018 ponen un ejemplo intuitivo usando dos equipos de futbol. Si a usted le preguntan qué equipo va a ganar, Barcelona vs Atlético de Madrid, su respuesta con certeza se va a basar en características de cada equipo (no es fan de ninguno). Digamos que utiliza cuatro: a) posición en la liga, b) resultado del último partido, c) local o visitante y d) número de goles. Si usted fuera una inteligencia artificial, depronto usaría una regresión para determinar la importancia de cada característica usando data histórica. Sin embargo, la teoría de heurísticas afirma que las personas usamos estrategias más simples. En este caso, tallying se refiere a darle igual peso a todas las características, sumar, y escoger el que tenga mayor valor. $$ Barcelona = Pos(20-3) + Ultimo(0) + Local(0) + Goles(44) $$

$$ Atletico = Pos(20-0) + Ultimo(0) + Local(1) + Goles(38) $$

En este ejemplo, Barcelona suma 61 y Atlético Madrid 59 (dentro de los paréntesis están los valores). Por lo tanto, la persona dice que Barcelona gana. Lo clave es notar que en tallying todas las características tienen un peso idéntico en la decisión.

Gerd Gigerenzer y colegas han dedicado buena parte de su vida académica a demostrar que este tipo de estrategias no solo son fáciles para la mente humana sino que también pueden ser óptimas. Un principio importante que han derivado es el de menos información es mejor (less-is-more). En el caso de tallying, la información de cuanto ponderar cada característica no se necesita pues es más flexible no poner pesos (i.e. bias-variance trade-off favorece a variance en muchos contextos de decisiones).

Una forma de pensar el principio less-is-more es con la actividad de dibujar con lapiz vs con colores. Pintar con colores puede acercarse más a la realidad, pero son díficiles de borrar. Sólo con lápiz, por su parte, limita a expresarse en blanco y negro pero hay más flexibilidad pues se puede borrar. La metáfora es para generar una intuición pues el argumento de Gigerenzer y colegas es más profundo y tiene que ver con el bias-variance tradeoff, donde modelos simples como tallying tienen buen poder predictivo, incluso mejor que regresión lineal simple con muestras pequeñas (e.g. reducen problemas de overfit).

“There is a point where too much information and too much information processing can hurt” Gigerenzer and Todd (1999) (p. 21)

Sin embargo, recientemente Paula Parpart y colegas en el 2018 reflexionaron que less-is-more es un principio extraño. Proponen que el principio opuesto tiene sentido en muchos otros dominios: más información es mejor. Para demostrarlo, pusieron a prueba tallying vs regresion en un marco Bayesiano común. Los detalles se pueden encontrar en el paper de ellos, pero la idea básica es simple: si hay una probabilidad grande de que las ponderaciones sean iguales es posible detectarlo con una aproximación bayesiana. En particular, se puede testear un prior de las ponderaciones de las caracteristicas (e.g. las 4 del ejemplo de fútbol; pero Parpart et al evaluaron 20 dominios diferentes, no solo fútbol) que en un extremo ponga todas las ponderaciones a una misma constante y en otro extremo haga una regresión tradicional. El prior que Parpart probó fue lo suficientemente flexible para encontrar posiblidades en cualquiera de estos dos extremos.

En la gráfica de abajo está la estimación Bayesiana que hicé de este prior para un dominio particular (horas de sueño de varios mamiferos; ver código en PyMC3 abajo y explicación conceptual en Parpart, et al, 2018). Sin entrar en detalles, se puede observar que no es cero, de hecho esta centrado alrededor de 0.7. Esto significa que la versión extrema de less-is-more no es correcta (less-is-more puro era que eta fuera cero o muy cercano). Es bueno utilizar información. Pero por otro lado eta no es infinito (regresión pura sería eta infinito). Esto quiere decir que dada la estructura del dominio, lo mejor es un intermedio entre tallying y regresión.

Lo impactante del paper de Paula Parpart y colegas es que nos permite entender mejor por qué los humanos usamos heurísticas: cuando hay un prior fuerte que los pesos sean iguales (eta=0) y esto depende de las características del ambiente. No solo eso. El paper de Parpart nos permite alejarnos de dicotomías extremas: se usan o no se usan heurísticas. Al contrario, la cognición humana se ubica en un continuo de un mismo proceso: usar información de forma apropiada.

English (by Google Translate with some edits)

This entry has at the end the code in PyMC3 for the Bayesian estimation of the parameters of the tallying heuristic. Specifically, the weights of the cues and the strength of the prior penalizing the use of all the information (adapted from the R version of Parpart, et al, 2018). Here is a succinct explanation of the heuristics and an interesting result from Parpart, et al, 2018.

An influential theory about decision making is that humans use heuristics, that is, shortcuts to choose between options. A famous heuristic is tallying. Parpart, et al, 2018 give an intuitive example using two soccer teams. If you are asked which team is going to win, Barcelona vs Atlético de Madrid, your answer will certainly be based on the characteristics of each team (you are not a fan of either). Let’s say you use four: a) league position, b) last game result, c) home or away, and d) number of goals. If you were an artificial intelligence, you would use a regression to determine the importance of each characteristic using historical data. However, the theory of heuristics states that people use simpler strategies. In this case, tallying refers to giving equal weight to all the characteristics, adding, and choosing the one with the highest value. $$ Barcelona = Pos(20-3) + Last(0) + Local(0) + Goals(44) $$

$$ Atletico = Pos(20-0) + Last(0) + Local(1) + Goals(38) $$

In this example, Barcelona adds 61 and Atlético Madrid 59 (values inside parentheses). Therefore, the person says that Barcelona wins. The key is to note that in tallying all the characteristics have an identical weight in the decision.

Gerd Gigerenzer and colleagues have spent much of their academic life demonstrating that these types of strategies are not only easy for the human mind but can also be optimal. An important principle that they have derived is that less information is better (less-is-more). In the case of tallying, the information on how much to weight each characteristic is not needed since it is more flexible not to put weights (i.e. bias-variance trade-off favors variance in many decision contexts).

One way to think about the less-is-more principle is with a pencil vs color drawing activity. Painting with colors can be closer to reality, but they are difficult to erase. On the other hand, with pencil one limits to expressing in black and white but there is more flexibility as it can be erased. The metaphor is to generate an intuition since the argument of Gigerenzer and colleagues is deeper and has to do with the bias-variance tradeoff, where simple models such as tallying have good predictive power, even better than simple linear regression with small samples (e.g. they reduce overfit problems).

“There is a point where too much information and too much information processing can hurt” Gigerenzer and Todd (1999) (p. 21)

However, recently Paula Parpart and colleagues in 2018 reflected that less-is-more is a strange principle. They propose that the opposite principle makes sense in many other domains: more information is better. To demonstrate this, they tested tallying vs. regression in a common Bayesian framework. The details can be found in their paper, but the basic idea is simple: if there is a high probability that the weights are equal it is possible to detect it with a Bayesian approximation. In particular, we can test a prior on the weights of the characteristics (e.g. the four of the soccer example; but Parpart et al evaluated 20 different domains, not just soccer) such that at one end the prior sets all the weights to the same constant (tallying) and in the other extreme the prior weights are sufficiently diffuse so that we execute a traditional regression. The prior Parpart tested was flexible enough to find possibilities at either extreme.

In the graph below is the Bayesian estimate I made of this prior for a particular domain (sleep hours of various mammals; see code in PyMC3 below and conceptual explanation in Parpart, et al, 2018). Without going into details, it can be seen that it is not zero, in fact it is centered around 0.7. This means that the extreme version of less-is-more is not correct (a pure less-is-more effect was that eta was zero or very close). It is good to use information. But on the other hand, eta is not infinite (pure regression would be eta equal infinity). This means that given the structure of the domain, the best is an intermediate between tallying and regression.

What is striking about the paper by Paula Parpart and colleagues is that it allows us to better understand why humans use heuristics: when there is a strong prior that the weights are equal (eta = 0) and this would depend on the structure of the environment. Not only that. Parpart’s paper allows us to get away from extreme dichotomies: heuristics are used or not used. On the contrary, human cognition is located on a continuum of the same process: using information in the best possible way.

Python

El material de Parpar et al (2018), los datos necesarios, y la implementación en PyMC3 también se pueden encontrar en este link. Es el notebook 7_Adaptive_Toolbox.ipynb. La data esta en la carpeta data/7_CB/Parpart 2018/Data.

The material from Parpar et al (2018), the necessary data, and the implementation in PyMC3 can also be found in this link. It’s the 7_Adaptive_Toolbox.ipynb notebook. The data is in the folder data / 7_CB / Parpart 2018 / Data.

import os
import copy
import numpy as np
import pandas as pd
import pymc3 as pm
import theano.tensor as tt
import theano
from theano.compile.ops import as_op
import arviz as az
from sklearn.model_selection import train_test_split
def sign(num):
#num: 1D numpy array with numbers
signs = deepcopy(num)
signs[num<0] = -1
signs[num>0] = 1
signs[num==0] = 0
return signs.astype('int')
def data_setup_parpart(idx_data, training_samples):
#idx_data: scalar, data index in ALL_DATA
#training_samples: scalar, number of training samples
#ALL_DATA: ordered as Fig. 3 of Parpart et al 2018.
ALL_DATA = ["house.world","mortality","cit.world","prf.world","bodyfat.world", "car.world","cloud",
"dropout","fat.world", "fuel.world", "glps",
"homeless.world", "landrent.world", "mammal.world", "oxidants",
"attractiveness.men", "attractiveness.women", "fish.fertility","oxygen", "ozone"]
y_pos = 2 #column position of dependent variable (criterion) i.e. correct answer to question is based on this
#in dataset, 0 below median, 1 above median.
#Gigerenzer et al, converted continuous variables to median splits
dataset = pd.read_table("data/7_CB/Parpart 2018/Data/" + ALL_DATA[idx_data] + ".txt")
idx = list(range(4,dataset.shape[1])) #make sure all cues start at column 4
col_cues = np.array(idx) #idx of columns with cues
labels_cues = dataset.columns[idx]
Predictors = len(labels_cues) #number of cues
N = dataset.shape[0] # number of objects e.g cities
#k = 100 #number of partitions for cross validations
#Create Paired Data (ALL binary comparisons of objects e.g. cities)
comb = np.array(list(combinations(list(range(N)), 2)))
idx = np.stack([np.random.choice([0,1], 2, replace = False) for rep in range(comb.shape[0])]) #[0,1] shuffled many times
comb = np.transpose(np.stack([comb[i,ele] for i,ele in enumerate(idx)])) #columns shuffled
y = np.repeat(np.nan, comb.shape[1]) # correct classification; A(+1) or B(-1)
difference = np.repeat(np.nan, comb.shape[1])
bdata_diff = pd.DataFrame(np.nan, index=np.arange(comb.shape[1]), columns=labels_cues)
for i in range(comb.shape[1]):
# takes out only the 2 rows from dataset that are compared at step i
binary = dataset.loc[comb[:,i],:].reset_index(drop=True) #2 random rows
if i == 0:
comparisons = binary
else:
comparisons = pd.concat([comparisons, binary])
## always compare row 1 with row 2 (no matter which ones has the higher criterion value) upper row - lower row
if binary.iloc[0,y_pos] > binary.iloc[1,y_pos]:
y[i] = 1 #(A)
else:
y[i] = - 1 #(B)
## cue values (row 1) - cue values (row 2)
bdata_diff.loc[i,:] = binary.loc[0,labels_cues] - binary.loc[1,labels_cues] #
bdata_diff['dependent'] = y
paired_data = copy.deepcopy(bdata_diff)
dataset = copy.deepcopy(paired_data)
# Assess paired_data cue validities and order as v= R/R+W ------R:right, W:wrong
cue_validities_raw = np.repeat(np.nan, Predictors)
cue_validities = np.repeat(np.nan, Predictors) #between 0 (does not predict which is better) and 1 (always predicts which is better)
for c in range(Predictors):
condition = (paired_data.iloc[:,c]==paired_data.loc[:,'dependent']).sum() == 0
if condition: # stays 0 now if it was 0
cue_validities[c] = 0
else:
cue_validities_raw[c] = (paired_data.iloc[:,c]==paired_data.loc[:, 'dependent']).sum()/((paired_data.iloc[:,c]==1).sum()+(paired_data.iloc[:,c]==-1).sum())
cue_validities[c] = cue_validities_raw[c] - 0.5 #the 0.5 is to make a 0.5 validity 0. Parpart's code says that this brings back to same scale as regression weights as otherwise order can be different!
cue_order = np.argsort(-abs(cue_validities))
# number of objects (e.g. paired cities comparisons) after evening out
N = dataset.shape[0]
#Partitions for cross-validation
percent_training = training_samples/N
# Generate the cross-validation partitions:
percent =(1 - percent_training) #### Hold the testset (distinct from random training set)
training_sample_size = percent_training*N
trainset, testset = train_test_split(dataset, test_size=percent)
trainset = trainset.reset_index(drop=True)
testset = testset.reset_index(drop=True)
Predictors = trainset.shape[1]-1
#Re-shuffling zero variance cases (incompatible with COR model)
cov_mat = trainset[labels_cues].corr()
# NA cases = zero variance cases, get resampled now until one is found without any zero variance cases
max_while = 1000000
mm = 0
while cov_mat.isna().any(axis = None) and mm<=max_while:
trainset, testset = train_test_split(dataset, test_size=percent)
trainset = trainset.reset_index(drop=True)
testset = testset.reset_index(drop=True)
cov_mat = trainset[labels_cues].corr()
if mm == max_while:
raise NameError('Reshuffling zero variance cases took too long')
return trainset, testset, Predictors, labels_cues, ALL_DATA[idx_data], cue_validities, cue_order, pd.read_table("data/7_CB/Parpart 2018/Data/" + ALL_DATA[idx_data] + ".txt")
#PyMC sampling
idx_data = 13 #There are 20 data sets i.e. between 0 and 19
#Problematic sets (e.g in sampling; or in partitions that take too long): 1, 6, 8,10,14,19 ... predictions for 7 are off
training_samples = 50 #for some datasets more than 20 wont work
trainset, testset, Predictors, labels_cues, dataused, cue_validities, cue_order, dataset_original = data_setup_parpart(idx_data, training_samples)
cue_order2 = np.argsort(-cue_validities)
x = trainset.loc[:,labels_cues].iloc[:,cue_order2] # assumption that cue directionalities are known in advance (Dawes, 1979)
y = trainset['dependent']
criterion = dataset_original.columns[2] #theme of the question e.g. higher price house for data set "house.world"
y[y<0] = 0 #changes dummy coding (option B:0; option A:1)
dirweights = sign(cue_validities[cue_order2]) # assumption that cue directionalities are known in advance (Dawes, 1979)
col_pos = (dirweights > 0)*1
mixed_cues = any(col_pos==0) and any(col_pos==1)
with pm.Model() as Half_Ridge2:
#Priors
#eta
eta_Ridge = pm.Uniform('eta', lower = 0.001, upper = 10)
#Weights
posNormal = pm.Bound(pm.Normal, lower=0.0)
negNormal = pm.Bound(pm.Normal, upper=0.0)
if mixed_cues:#some cues are positively and other negatively related to the criterion
weight_pos = posNormal('weights_pos', mu=0, sigma=eta_Ridge, shape = (col_pos==1).sum())
weight_neg = negNormal('weights_neg', mu=0, sigma=eta_Ridge, shape = (col_pos==0).sum())
weights = pm.Deterministic('weights', tt.concatenate([weight_pos, weight_neg]))
elif any(col_pos==1): #all cues are positively related to the criterion
weight_pos = posNormal('weights_pos', mu=0, sigma=eta_Ridge, shape = (col_pos==1).sum())
weights = pm.Deterministic('weights', weight_pos)
elif any(col_pos==0): #all cues are negatively related to the criterion
weight_neg = negNormal('weights_neg', mu=0, sigma=eta_Ridge, shape = (col_pos==0).sum())
weights = pm.Deterministic('weights', weight_pos)
print(weights.tag.test_value.shape)
#Likelihood
mu = weights*x #rows stimulus, columns: cues, cells: cue*weight
print(mu.tag.test_value.shape)
theta = pm.Deterministic('theta', pm.math.sigmoid(tt.sum(mu, axis=1)))
print(theta.tag.test_value.shape)
y_1 = pm.Bernoulli('y_1', p=theta, observed=y)
#Sampling
trace = pm.sample(1000, init = 'adapt_diag', tune=1500, target_accept = 0.95)
#ppc = pm.sample_posterior_predictive(trace, samples=5000)
data = az.from_pymc3(trace=trace)
az.plot_density(data, var_names=['eta']);

Referencias:

Parpart, P., Jones, M., & Love, B. C. (2018). Heuristics as Bayesian inference under extreme priors. Cognitive psychology, 102, 127-144.

Cumulative Prospect Theory (PyMC3)

Sat, 03 Oct 2020 08:51:25 -0500

(see below English translation)

Esta entrada tiene al final el código en PyMC3 para la estimación Bayesiana de los parámetros de teoría de prospectos (adaptado de la version winbugs de Nilsson, et al, 2011).

Una de las teorías más conocidas en economía del comportamiento es prospect theory. Es una teoría tradicional de valor esperado (V) donde el agente combina valor y probabilidad (V = vp) para decidir. El valor (v) y la probabilidad (p) que se observa en la realidad se transforma/percibe de forma no-lineal y se juzgan a partir de una referencia. La imagen de abajo muestra las dos funciones propuestas en teoría de prospectos (líneas punteadas).

Las formulas de estas gráficas son:

Valor: $$ v(x)= \begin{cases} x^\alpha & x\ge0 \\ -\lambda(-x^\beta) & x<0 \end{cases} $$

$$ \alpha, \; \beta \; \text{miden actitudes de riesgo} $$

Probabilidades: $$ w(p_x) = \frac{p_x^c}{(p_x^c - (1-p_x^c))^{1/c}} $$

$$ c = \gamma \text{, si ganancia, } c = \delta \text{, si perdida.} $$

El agente decide en función del valor esperado V. $$ V(x) = v(x)w(p_x) $$ Por ejemplo, Nilsson, et al (2011) proponen una decisión estocástica entre el valor esperado de dos opciones A y B. $$ p(A) = \frac{1}{1+e^{\phi(V(B)-V(A))}} $$

$$ \phi ; \text{importancia del valor para escoger A (o B)} $$

En suma, son 6 parámetros a estimar $$ \alpha, \beta \text{ actitudes al riesgo} $$

$$ \lambda \text{ aversion perdidas} $$

$$ \gamma, \delta \text{ percepcion de probabilidad} $$

$$ \phi \text{ temperatura} $$

La versión de prospect theory propuesta por Nilsson, et al (2011) es jerárquica (ver diagrama abajo). Leer su paper para mayores detalles. En general, al hacer una versión jerárquica se aprovecha la información de todos los sujetos para obtener mayor precisión en la estimación de los parámetros por sujeto.

English (by Google Translate with some edits)

This post has at the end the PyMC3 code for the Bayesian estimation of prospect theory parameters (adapted from the winbugs version of Nilsson, et al, 2011).

One of the best known theories in behavioral economics is prospect theory. It is a traditional theory of expected value (V) where the agent combines value and probability (V = vp) to decide. The value (v) and the probability (p) that is observed is transformed / perceived in a non-linear way and are judged from a reference. The image below shows the two proposed functions in prospect theory (dotted lines).

The formulas are:

Value: $$ v(x)= \begin{cases} x^\alpha & x\ge0 \\ -\lambda(-x^\beta) & x<0 \end{cases} $$

$$ \alpha, \; \beta \; \text{risk attitudes} $$

Probabilities: $$ w(p_x) = \frac{p_x^c}{(p_x^c - (1-p_x^c))^{1/c}} $$

$$ c = \gamma \text{, if gain, } c = \delta \text{, if loss.} $$

The agent decides based on expected value (V). $$ V(x) = v(x)w(p_x) $$ For instance, Nilsson, et al (2011) used an stochastic choice rule between the expected value of the options A and B $$ p(A) = \frac{1}{1+e^{\phi(V(B)-V(A))}} $$

$$ \phi ; \text{relevance of V to pick A (o B)} $$

In brief, there are 6 parameters $$ \alpha, \beta \text{ risk attitudes} $$

$$ \lambda \text{ loss aversion} $$

$$ \gamma, \delta \text{ probability perception} $$

$$ \phi \text{ temperature} $$

Nilsson, et al (2011) propose a hierarchical version of prospect theory (diagram below). Details in their paper. In general, when making a hierarchical version, the information from all the subjects is used to obtain greater precision in the estimation of the parameters per subject.

Python

El material de Nilsson et al (2011), los datos necesarios, y la implementación en PyMC3 también se pueden encontrar en este link.

The material from Nilsson et al (2011), the necessary data, and the implementation in PyMC3 can also be found in this link.

#Libraries and functions
import pymc3 as pm
import theano.tensor as tt #NOTA: theano va a cambiar a tensorflow en PyMC4
import numpy as np
import pandas as pd
def norm_cdf(x, mean=0, std=1):
return (1.0 + tt.erf((x-mean) / tt.sqrt(2.0*(std**2)))) / 2.0 #cdf; (x is a normal sample)
# Load data
gambles_A = pd.read_table("GambleA.txt", header=None)
gambles_A.columns = ['Reward_1', 'Prob_1', 'Reward_2', 'Prob_2']
gambles_A_win = gambles_A.loc[0:59,:].copy()
gambles_A_loss = gambles_A.loc[60:119,:].copy()
gambles_A_mix = gambles_A.loc[120:179,:].copy()
gambles_B = pd.read_table("GambleB.txt", header=None)
gambles_B.columns = ['Reward_1', 'Prob_1', 'Reward_2', 'Prob_2']
gambles_B_win = gambles_B.loc[0:59,:].copy()
gambles_B_loss = gambles_B.loc[60:119,:].copy()
gambles_B_mix = gambles_B.loc[120:179,:].copy()
Rieskamp_data = pd.read_table('Rieskamp_data.txt', header=None)
# 0: choice gamble A
# 1: choice gamble B
Rieskamp_data_win = Rieskamp_data.loc[0:59,:].copy()
Rieskamp_data_loss = Rieskamp_data.loc[60:119,:].copy()
Rieskamp_data_mix = Rieskamp_data.loc[120:179,:].copy()
ntrials = Rieskamp_data.shape[0]
ntrials_by_type = int(ntrials/3)
nsubj = Rieskamp_data.shape[1]
#PyMC3 model
with pm.Model() as CPT:
# Here priors for the hyperdistributions are defined:
### alpha (risk attitude win)
mu_alpha_N = pm.Normal('mu_alpha_N', 0, 1)
sigma_alpha_N = pm.Uniform('sigma_alpha_N', 0, 10)
### beta (risk attitude lose)
mu_beta_N = pm.Normal('mu_beta_N', 0, 1)
sigma_beta_N = pm.Uniform('sigma_beta_N', 0, 10)
### gamma (non-linearity in prob. win)
mu_gamma_N = pm.Normal('mu_gamma_N', 0, 1)
sigma_gamma_N = pm.Uniform('sigma_gamma_N', 0, 10)
### delta (non-linearity in prob. lose)
mu_delta_N = pm.Normal('mu_delta_N', 0, 1)
sigma_delta_N = pm.Uniform('sigma_delta_N', 0, 10)
### lambda (loss aversion)
mu_l_lambda_N = pm.Uniform('mu_l_lambda_N', -2.3, 1.61)
sigma_l_lambda_N = pm.Uniform('sigma_l_lambda_N', 0, 1.13)
### luce (temperature of softmax)
mu_l_luce_N = pm.Uniform('mu_l_luce_N', -2.3, 1.61)
sigma_l_luce_N = pm.Uniform('sigma_l_luce_N', 0, 1.13)
## We put group-level normal's on the individual parameters.
## This models alpha, beta, gamma, and delta as probitized parameters.
## That is, it models parameteres on the probit scale and then
## puts them back to the range 0-1 with the CDF.
## Lambda and luce are positive and modeled in log scale.
## Each participant has unique parameter-values:
## alpha, beta, gamma, delta, lambda, and luce
alpha_N = pm.TruncatedNormal('alpha_N', mu_alpha_N, sigma_alpha_N,
lower = -3, upper = 3,
shape = nsubj)
beta_N = pm.TruncatedNormal('beta_N', mu_beta_N, sigma_beta_N,
lower = -3, upper = 3,
shape = nsubj)
gamma_N = pm.TruncatedNormal('gamma_N', mu_gamma_N, sigma_gamma_N,
lower = -3, upper = 3,
shape = nsubj)
delta_N = pm.TruncatedNormal('delta_N', mu_delta_N, sigma_delta_N,
lower = -3, upper = 3,
shape = nsubj)
lambda_N = pm.Normal('lambda_N', mu_l_lambda_N, sigma_l_lambda_N,
shape = nsubj)
luce_N = pm.Normal('luce_N', mu_l_luce_N, sigma_l_luce_N,
shape = nsubj)
### Put everything in the desired scale
## We use cdf to bound probitized parameters to be in 0-1
alpha = pm.Deterministic('alpha', norm_cdf(alpha_N))
beta = pm.Deterministic('beta', norm_cdf(beta_N))
gamma = pm.Deterministic('gamma', norm_cdf(gamma_N))
delta = pm.Deterministic('delta', norm_cdf(delta_N))
## We exp because we assume a log. scale
lambd = pm.Deterministic('lambbda', tt.exp(lambda_N))
luce = pm.Deterministic('luce', tt.exp(luce_N))
# It is now time to define how the model should be fit to data.
############ WIN TRIALS ############
gambless_A = gambles_A_win
gambless_B = gambles_B_win
##GAMBLE A
## subjective value of outcomes x & y in gamble A
reward_1 = np.tile(np.array(gambless_A['Reward_1']),(nsubj,1)).transpose()
reward_2 = np.tile(np.array(gambless_A['Reward_2']),(nsubj,1)).transpose()
v_x_a = pm.Deterministic('v_x_a', reward_1**tt.tile(alpha,(ntrials_by_type,1)))
v_y_a = pm.Deterministic('v_y_a', reward_2**tt.tile(alpha,(ntrials_by_type,1)))
## subjective prob. of outcomes x & y in gamble A
prob_1 = np.tile(np.array(gambless_A['Prob_1']),(nsubj,1)).transpose()
prob_2 = np.tile(np.array(gambless_A['Prob_2']),(nsubj,1)).transpose()
z_a = pm.Deterministic('z_a', prob_1**tt.tile(gamma,(ntrials_by_type,1)) + prob_2**tt.tile(gamma,(ntrials_by_type,1)))
den_a = pm.Deterministic('den_a', z_a**(1/tt.tile(gamma,(ntrials_by_type,1))))
num_x_a = pm.Deterministic('num_x_a', prob_1**tt.tile(gamma,(ntrials_by_type,1)))
w_x_a = pm.Deterministic('w_x_a', num_x_a / den_a)
num_y_a = pm.Deterministic('num_y_a', prob_2**tt.tile(gamma,(ntrials_by_type,1)))
w_y_a = pm.Deterministic('w_y_a', num_y_a / den_a)
##subjective value of gamble A
Vf_a = pm.Deterministic('Vf_a', w_x_a * v_x_a + w_y_a * v_y_a)
#GAMBLE B
## subjective value of outcomes x & y in gamble B
reward_1 = np.tile(np.array(gambless_B['Reward_1']),(nsubj,1)).transpose()
reward_2 = np.tile(np.array(gambless_B['Reward_2']),(nsubj,1)).transpose()
v_x_b = pm.Deterministic('v_x_b', reward_1**tt.tile(alpha,(ntrials_by_type,1)))
v_y_b = pm.Deterministic('v_y_b', reward_2**tt.tile(alpha,(ntrials_by_type,1)))
## subjective prob. of outcomes x & y in gamble B
prob_1 = np.tile(np.array(gambless_B['Prob_1']),(nsubj,1)).transpose()
prob_2 = np.tile(np.array(gambless_B['Prob_2']),(nsubj,1)).transpose()
z_b = pm.Deterministic('z_b', prob_1**tt.tile(gamma,(ntrials_by_type,1)) + prob_2**tt.tile(gamma,(ntrials_by_type,1)))
den_b = pm.Deterministic('den_b', z_b**(1/tt.tile(gamma,(ntrials_by_type,1))))
num_x_b = pm.Deterministic('num_x_b', prob_1**tt.tile(gamma,(ntrials_by_type,1)))
w_x_b = pm.Deterministic('w_x_b', num_x_b / den_b)
num_y_b = pm.Deterministic('num_y_b', prob_2**tt.tile(gamma,(ntrials_by_type,1)))
w_y_b = pm.Deterministic('w_y_b', num_y_b / den_b)
##subjective value of gamble B
Vf_b = pm.Deterministic('Vf_b', w_x_b * v_x_b + w_y_b * v_y_b)
## Difference in value
#print(den)
dv = pm.Deterministic('D', (Vf_a - Vf_b))
##likelihood
## choice for gamble-pair is a Bernoulli-distribution
## with p = binval
## binval is luce's choice rule (akin to a softmax)
binval = pm.Deterministic('binval', 1/(1+tt.exp((tt.tile(luce,(ntrials_by_type,1))) * (dv)))) #prob. of B
datta = np.array(Rieskamp_data_win)
win_obs = pm.Bernoulli('win_obs', p = binval, observed = datta)
############ LOSS TRIALS ############
gambless_A = gambles_A_loss
gambless_B = gambles_B_loss
##GAMBLE A
## subjective value of outcomes x & y in gamble A
reward_1 = np.tile(np.array(gambless_A['Reward_1']),(nsubj,1)).transpose()
reward_2 = np.tile(np.array(gambless_A['Reward_2']),(nsubj,1)).transpose()
v_x_a_l = pm.Deterministic('v_x_a_l', (-1)*((-reward_1)**tt.tile(beta,(ntrials_by_type,1))))
v_y_a_l = pm.Deterministic('v_y_a_l', (-1)*((-reward_2)**tt.tile(beta,(ntrials_by_type,1))))
## subjective prob. of outcomes x & y in gamble A
prob_1 = np.tile(np.array(gambless_A['Prob_1']),(nsubj,1)).transpose()
prob_2 = np.tile(np.array(gambless_A['Prob_2']),(nsubj,1)).transpose()
z_a_l = pm.Deterministic('z_a_l', prob_1**tt.tile(delta,(ntrials_by_type,1)) + prob_2**tt.tile(delta,(ntrials_by_type,1)))
den_a_l = pm.Deterministic('den_a_l', z_a_l**(1/tt.tile(delta,(ntrials_by_type,1))))
num_x_a_l = pm.Deterministic('num_x_a_l', prob_1**tt.tile(delta,(ntrials_by_type,1)))
w_x_a_l = pm.Deterministic('w_x_a_l', num_x_a_l / den_a_l)
num_y_a_l = pm.Deterministic('num_y_a_l', prob_2**tt.tile(delta,(ntrials_by_type,1)))
w_y_a_l = pm.Deterministic('w_y_a_l', num_y_a_l / den_a_l)
##subjective value of gamble A
Vf_a_l = pm.Deterministic('Vf_a_l', w_x_a_l * v_x_a_l + w_y_a_l * v_y_a_l)
#GAMBLE B
## subjective value of outcomes x & y in gamble B
reward_1 = np.tile(np.array(gambless_B['Reward_1']),(nsubj,1)).transpose()
reward_2 = np.tile(np.array(gambless_B['Reward_2']),(nsubj,1)).transpose()
v_x_b_l = pm.Deterministic('v_x_b_l', (-1)*((-reward_1)**tt.tile(beta,(ntrials_by_type,1))))
v_y_b_l = pm.Deterministic('v_y_b_l', (-1)*((-reward_2)**tt.tile(beta,(ntrials_by_type,1))))
## subjective prob. of outcomes x & y in gamble B
prob_1 = np.tile(np.array(gambless_B['Prob_1']),(nsubj,1)).transpose()
prob_2 = np.tile(np.array(gambless_B['Prob_2']),(nsubj,1)).transpose()
z_b_l = pm.Deterministic('z_b_l', prob_1**tt.tile(delta,(ntrials_by_type,1)) + prob_2**tt.tile(delta,(ntrials_by_type,1)))
den_b_l = pm.Deterministic('den_b_l', z_b_l**(1/tt.tile(delta, (ntrials_by_type,1))))
num_x_b_l = pm.Deterministic('num_x_b_l', prob_1**tt.tile(delta, (ntrials_by_type,1)))
w_x_b_l = pm.Deterministic('w_x_b_l', num_x_b_l / den_b_l)
num_y_b_l = pm.Deterministic('num_y_b_l', prob_2**tt.tile(delta, (ntrials_by_type,1)))
w_y_b_l = pm.Deterministic('w_y_b_l', num_y_b_l / den_b_l)
##subjective value of gamble B
Vf_b_l = pm.Deterministic('Vf_b_l', w_x_b_l * v_x_b_l + w_y_b_l * v_y_b_l)
## Difference in value
#print(den)
dv_l = pm.Deterministic('D_l', (Vf_a_l - Vf_b_l))
##likelihood
## choice for gamble-pair is a Bernoulli-distribution
## with p = binval
## binval is luce's choice rule (akin to a softmax)
binval_l = pm.Deterministic('binval_l', 1/(1+tt.exp((tt.tile(luce,(ntrials_by_type,1))) * (dv_l)))) #prob. of B
datta = np.array(Rieskamp_data_loss)
loss_obs = pm.Bernoulli('loss_obs', p = binval_l, observed = datta)
############ MIX TRIALS ############
gambless_A = gambles_A_mix
gambless_B = gambles_B_mix
##GAMBLE A
## subjective value of outcomes x & y in gamble A
reward_1 = np.tile(np.array(gambless_A['Reward_1']),(nsubj,1)).transpose()
reward_2 = np.tile(np.array(gambless_A['Reward_2']),(nsubj,1)).transpose()
v_x_a_m = pm.Deterministic('v_x_a_m', reward_1**tt.tile(alpha,(ntrials_by_type,1)))
v_y_a_m = pm.Deterministic('v_y_a_m', (-1*tt.tile(lambd,(ntrials_by_type,1)))*((-reward_2)**tt.tile(beta,(ntrials_by_type,1))))
## subjective prob. of outcomes x & y in gamble A
prob_1 = np.tile(np.array(gambless_A['Prob_1']),(nsubj,1)).transpose()
prob_2 = np.tile(np.array(gambless_A['Prob_2']),(nsubj,1)).transpose()
z_a_m = pm.Deterministic('z_a_m', prob_1**tt.tile(gamma,(ntrials_by_type,1)) + prob_2**tt.tile(delta,(ntrials_by_type,1)))
den_a1_m = pm.Deterministic('den_a1_m', z_a_m**(1/tt.tile(gamma,(ntrials_by_type,1))))
den_a2_m = pm.Deterministic('den_a2_m', z_a_m**(1/tt.tile(delta,(ntrials_by_type,1))))
num_x_a_m = pm.Deterministic('num_x_a_m', prob_1**tt.tile(gamma,(ntrials_by_type,1)))
w_x_a_m = pm.Deterministic('w_x_a_m', num_x_a_m / den_a1_m)
num_y_a_m = pm.Deterministic('num_y_a_m', prob_2**tt.tile(delta,(ntrials_by_type,1)))
w_y_a_m = pm.Deterministic('w_y_a_m', num_y_a_m / den_a2_m)
##subjective value of gamble A
Vf_a_m = pm.Deterministic('Vf_a_m', w_x_a_m * v_x_a_m + w_y_a_m * v_y_a_m)
##GAMBLE B
## subjective value of outcomes x & y in gamble B
reward_1 = np.tile(np.array(gambless_B['Reward_1']),(nsubj,1)).transpose()
reward_2 = np.tile(np.array(gambless_B['Reward_2']),(nsubj,1)).transpose()
v_x_b_m = pm.Deterministic('v_x_b_m', reward_1**tt.tile(alpha,(ntrials_by_type,1)))
v_y_b_m = pm.Deterministic('v_y_b_m', (-1*tt.tile(lambd,(ntrials_by_type,1)))*((-reward_2)**tt.tile(beta,(ntrials_by_type,1))))
## subjective prob. of outcomes x & y in gamble B
prob_1 = np.tile(np.array(gambless_B['Prob_1']),(nsubj,1)).transpose()
prob_2 = np.tile(np.array(gambless_B['Prob_2']),(nsubj,1)).transpose()
z_b_m = pm.Deterministic('z_b_m', prob_1**tt.tile(gamma,(ntrials_by_type,1)) + prob_2**tt.tile(delta,(ntrials_by_type,1)))
den_b1_m = pm.Deterministic('den_b1_m', z_b_m**(1/tt.tile(gamma,(ntrials_by_type,1))))
den_b2_m = pm.Deterministic('den_b2_m', z_b_m**(1/tt.tile(delta,(ntrials_by_type,1))))
num_x_b_m = pm.Deterministic('num_x_b_m', prob_1**tt.tile(gamma,(ntrials_by_type,1)))
w_x_b_m = pm.Deterministic('w_x_b_m', num_x_b_m / den_b1_m)
num_y_b_m = pm.Deterministic('num_y_b_m', prob_2**tt.tile(delta,(ntrials_by_type,1)))
w_y_b_m = pm.Deterministic('w_y_b_m', num_y_b_m / den_b2_m)
##subjective value of gamble B
Vf_b_m = pm.Deterministic('Vf_b_m', w_x_b_m * v_x_b_m + w_y_b_m * v_y_b_m)
## Difference in value
#print(den)
dv_m = pm.Deterministic('D_m', (Vf_a_m - Vf_b_m))
##likelihood
## choice for gamble-pair is a Bernoulli-distribution
## with p = binval
## binval is luce's choice rule (akin to a softmax)
binval_m = pm.Deterministic('binval_m', 1/(1+tt.exp((tt.tile(luce,(ntrials_by_type,1))) * (dv_m)))) #prob. of B
datta = np.array(Rieskamp_data_mix)
mix_obs = pm.Bernoulli('mix_obs', p = binval_m, observed = datta)
############## Sampling ##############
trace = pm.sample(1000, tune = 1500, init='adapt_diag', target_accept = 0.95)
rhat = pm.rhat(trace, var_names = ['alpha', 'beta', 'gamma', 'delta', 'lambbda', 'luce'])

Referencias:

Nilsson, H., Rieskamp, J., & Wagenmakers, E. J. (2011). Hierarchical Bayesian parameter estimation for cumulative prospect theory. Journal of Mathematical Psychology, 55(1), 84-93

Model Name

Sat, 03 Oct 2020 08:51:25 -0500

Diagram 1

graph TD
A --> B

Diagram 2

flowchart TB
c1-->a2
subgraph one
a1-->a2
end
subgraph two
b1-->b2
end
subgraph three
c1-->c2
end
one --> two
three --> two
two --> c2

Diagram 3

 graph TB
A & B--> C & D

Diagram 4

 graph LR
a --> b & c--> d

Charts

 pie title Choices
"Risky": 50
"Save": 40
"Indifferent": 10

Python Code

for i in range(8):
print(i)