4/11/2018
$ y_i = \begin{cases} 0 \quad &\textrm{with probability $\varphi_i$} \\ g(y_i) \quad &\textrm{with probability 1 - $\varphi_i$} \end{cases} $
$ P\left(Y_i = y_i \textrm{ } \big| \textrm{ } x_i, z_i \right) = \begin{cases} \varphi_i + (1 - \varphi_i) \cdot \exp(-\mu_i) \quad &\textrm{if $y_i = 0$} \\ (1 - \varphi_i) \cdot \exp(-\mu_i) \cdot \mu_i^{y_i} \big/ y_i! \quad &\textrm{if $y_i > 0$} \end{cases} $
$$ \mathcal{L} = \sum_{i=1}^n w_i \log \left[ P\left( y_i \textrm{ } \big| \textrm{ } x_i, z_i \right) \right] $$
from zeroinfl import *
import numpy as np
import pandas as pd
from pandas.core import datetools
df = pd.read_csv('DebTrivedi.csv',index_col = [0])
sel = np.array([1, 6, 7, 8, 13, 15, 18])-1
df = df.iloc[:,sel]
formula_str = 'ofp ~ hosp + health + numchron + gender + school + privins | health'
import matplotlib.pyplot as plt
data_ofp = df.iloc[:,0].values.astype(int)
plt.hist(data_ofp, bins=np.arange(data_ofp.min(), data_ofp.max()), align = 'left', ec='black')
plt.xlabel('Number of physician office visits');plt.ylabel('Frequency')
plt.show()
ZeroInflated(formula_str, df).fit().printModel()
Call: ZeroInflated(formula_str='ofp ~ hosp + health + numchron + gender + school + privins | health', data=df, dist='poisson', offsetx=None, offsetz=None, link='logit', weights=None, missing='none') Count model cefficients (poisson log link): Intercept health[T.excellent] health[T.poor] gender[T.male] privins[T.yes] \ 1.39367 -0.30775 0.25420 -0.06486 0.08544 hosp numchron school 0.15913 0.10328 0.01959 Zero-inflation model coefficients (binomial with logit link): Intercept health[T.excellent] health[T.poor] -1.73364 0.47491 -0.39916
> model <- pscl::zeroinfl(ofp ~ health+gender+privins+hosp+numchron+school| health, data = dt, dist = 'poisson') > print(model) Call: pscl::zeroinfl(formula = ofp ~ health + gender + privins + hosp + numchron + school | health, data = dt, dist = "poisson") Count model coefficients (poisson with log link): (Intercept) healthpoor healthexcellent gendermale privinsyes hosp 1.39367 0.25420 -0.30775 -0.06486 0.08544 0.15913 numchron school 0.10328 0.01959 Zero-inflation model coefficients (binomial with logit link): (Intercept) healthpoor healthexcellent -1.7336 -0.3992 0.4749
ZeroInflated(formula_str, df).fit().summary()
Call: ZeroInflated(formula_str='ofp ~ hosp + health + numchron + gender + school + privins | health', data=df, dist='poisson', offsetx=None, offsetz=None, link='logit', weights=None, missing='none') Pearson residuals: Min 1Q Median 3Q Max -28.81532 -3.64394 -1.42507 1.97142 72.65782 Count model cefficients (poisson log link): | Estimate|Std. Error| z value| Pr(>|z|) ---------------------------------------------------------------- Intercept| 1.3937| 0.0245| 56.86| <2e-16 health[T.excellent]| -0.3078| 0.0314| -9.791| <2e-16 health[T.poor]| 0.2542| 0.0177| 14.339| <2e-16 gender[T.male]| -0.0649| 0.0131| -4.946| 7.5608e-7 privins[T.yes]| 0.0854| 0.0173| 4.936| 7.9923e-7 hosp| 0.1591| 0.0061| 26.262| <2e-16 numchron| 0.1033| 0.0047| 21.805| <2e-16 school| 0.0196| 0.0019| 10.393| <2e-16 Zero-inflation model coefficients (binomial with logit link): | Estimate|Std. Error| z value| Pr(>|z|) ---------------------------------------------------------------- Intercept| -1.7336| 0.0481| -36.04| <2e-16 health[T.excellent]| 0.4749| 0.1455| 3.265| 1.0960e-3 health[T.poor]| -0.3992| 0.1466| -2.724| 6.4570e-3 --- Number of iterations in BFGS optimization: 13 Log-likelihood: -1.629e+4 on 11 Df.
> summary(model) Call: pscl::zeroinfl(formula = ofp ~ health + gender + privins + hosp + numchron + school | health, data = dt, dist = "poisson") Pearson residuals: Min 1Q Median 3Q Max -2.5417 -1.2132 -0.4612 0.5889 25.1526 Count model coefficients (poisson with log link): Estimate Std. Error z value Pr(>|z|) (Intercept) 1.393672 0.024506 56.870 < 2e-16 *** healthpoor 0.254201 0.017726 14.340 < 2e-16 *** healthexcellent -0.307754 0.031428 -9.792 < 2e-16 *** gendermale -0.064859 0.013111 -4.947 7.54e-07 *** privinsyes 0.085441 0.017310 4.936 7.97e-07 *** hosp 0.159127 0.006059 26.262 < 2e-16 *** numchron 0.103280 0.004736 21.806 < 2e-16 *** school 0.019591 0.001885 10.395 < 2e-16 *** Zero-inflation model coefficients (binomial with logit link): Estimate Std. Error z value Pr(>|z|) (Intercept) -1.73364 0.04809 -36.050 < 2e-16 *** healthpoor -0.39916 0.14655 -2.724 0.00646 ** healthexcellent 0.47490 0.14542 3.266 0.00109 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Number of iterations in BFGS optimization: 17 Log-likelihood: -1.629e+04 on 11 Df
Count: | Python | R |
---|---|---|
Intercept | 1.39367 | 1.39367 |
health[T.excellent] | -0.30775 | -0.30775 |
health[T.poor] | 0.25420 | 0.25420 |
gender[T.male] | -0.06486 | -0.06486 |
privins[T.yes] | 0.08544 | 0.08544 |
hosp | 0.15913 | 0.15913 |
numchron | 0.10328 | 0.10328 |
school | 0.01959 | 0.01959 |
Zero: | Python | R |
---|---|---|
Intercept | -1.73364 | -1.73364 |
health[T.excellent] | 0.47490 | 0.47489 |
health[T.poor] | -0.39916 | -0.39916 |
Count: | Python | R |
---|---|---|
Intercept | 0.92523 | 0.92523 |
health[T.excellent] | -0.34160 | -0.34160 |
health[T.poor] | 0.31351 | 0.31352 |
gender[T.male] | -0.12678 | -0.12678 |
privins[T.yes] | 0.22483 | 0.22483 |
hosp | 0.22057 | 0.22057 |
numchron | 0.17603 | 0.17603 |
school | 0.02689 | 0.02689 |
Zero: | Python | R |
---|---|---|
Intercept | -24.34361 | -24.33868 |
health[T.excellent] | 10.00308 | 9.99858 |
health[T.poor] | 19.44489 | 19.43996 |
Count: | Python | R |
---|---|---|
Intercept | 0.94181 | 0.94206 |
health[T.excellent] | -0.32944 | -0.32945 |
health[T.poor] | 0.32891 | 0.32884 |
gender[T.male] | -0.12468 | -0.12465 |
privins[T.yes] | 0.21839 | 0.21832 |
hosp | 0.22252 | 0.22250 |
numchron | 0.17366 | 0.17363 |
school | 0.02667 | 0.02667 |
theta | 1.25876 | 1.25925 |
Zero: | Python | R |
---|---|---|
Intercept | -5.04317 | -5.02321 |
health[T.excellent] | 1.08234 | 1.06984 |
health[T.poor] | 1.61952 | 1.60087 |