PHP Script for Calculating Confidence Intervals

This entry was posted by on Tuesday, 29 November, 2011 at

Some time ago I wrote about a Shell Script for calculating confidence intervals. Since I also do a lot of number crunching with PHP (hey, it’s web-based and works great with an SQL server for the data!) I wrote a PHP script to get the job done. Here’s how.

I stuff all these handy functions in a utility file called utils.php, which I include whenever needed.

You then use it like this:

include("utils.php");
...
$vals = array(3, 4, 5, 7, 8);
//get the stddev:
$std=stddev($vals);
//get the std error:
$stderr = $std / sqrt(count($vals)); 
//now get the CI:
$CI = $stderr*ttable((count($vals)-1),0.025); 

Note that the first argument in ttable() is the degrees of freedom (number of repetitions, observations or samples, minus one). The second argument is the ‘tail probability’. Because this is two-sided, 0.025 gives the 95% confidence interval (2.5% left and 2.5% right of the mean).

In the above example, the stddev is 2.0736441353328, stderr 0.29623487647611 and the 95% confidence interval (with 4 degrees of freedom) is 0.82234801709768. Needless to say, the mean is 5.4. So now when you want to plot this, you plot a point at 5.4, and the typical “errorbars” (or “yerrorbars” in gnuplot lingo) at 4.5776519829023 and 6.2223480170977. These values are mean-CI and mean+CI, and the mean is right in the middle.

The contents of utils.php:

// Function to calculate square of value - mean
function sd_square($x, $mean) { return pow($x - $mean,2); }

// Function to calculate standard deviation (uses sd_square) 
// input: an array over which you want to calculate the stddev

function stddev($array) {
    // square root of sum of squares divided by N-1
    return sqrt(array_sum(array_map("sd_square", $array, array_fill(0,count($array), (array_sum($array) / count($array)) ) ) ) / (count($array)-1) );
}

// consult the student t-table, degrees of freedom vs. tail probability
// check here to see how to calculate confidence intervals using the ttable:
// http://www.ehow.com/how_5933144_calculate-confidence-interval-mean.html

function ttable($df, $p){
    $percs = array(0.25, 0.20, 0.15, 0.10, 0.05, 0.025, 0.02, 0.01, 0.005, 0.0025, 0.001, 0.0005);
    $pkey=array_search($p,$percs);
    if($pkey===FALSE){return 128;}

    //df .25 .20 .15 .10 .05 .025 .02 .01 .005 .0025 .001 .0005
$dfs = array(
"1" => array( 1.000,  1.376,  1.963,  3.078,  6.314,  12.71,  15.89,  31.82,  63.66,  127.3,  318.3,  636.6 ),
"2" => array( 0.816,  1.061,  1.386,  1.886,  2.920,  4.303,  4.849,  6.965,  9.925,  14.09,  22.33,  31.60 ),
"3" => array( 0.765,  0.978,  1.250,  1.638,  2.353,  3.182,  3.482,  4.541,  5.841,  7.453,  10.21,  12.92  ),  
"4" => array( 0.741,  0.941,  1.190,  1.533,  2.132,  2.776,  2.999,  3.747,  4.604,  5.598,  7.173,  8.610  ),  
"5" => array( 0.727,  0.920,  1.156,  1.476,  2.015,  2.571,  2.757,  3.365,  4.032,  4.773,  5.893,  6.869  ),  
"6" => array( 0.718,  0.906,  1.134,  1.440,  1.943,  2.447,  2.612,  3.143,  3.707,  4.317,  5.208,  5.959  ),  
"7" => array( 0.711,  0.896,  1.119,  1.415,  1.895,  2.365,  2.517,  2.998,  3.499,  4.029,  4.785,  5.408  ),  
"8" => array( 0.706,  0.889,  1.108,  1.397,  1.860,  2.306,  2.449,  2.896,  3.355,  3.833,  4.501,  5.041  ),  
"9" => array( 0.703,  0.883,  1.100,  1.383,  1.833,  2.262,  2.398,  2.821,  3.250,  3.690,  4.297,  4.781  ),  
"10" => array( 0.700,  0.879,  1.093,  1.372,  1.812,  2.228,  2.359,  2.764,  3.169,  3.581,  4.144,  4.587  ),  
"11" => array( 0.697,  0.876,  1.088,  1.363,  1.796,  2.201,  2.328,  2.718,  3.106,  3.497,  4.025,  4.437  ),  
"12" => array( 0.695,  0.873,  1.083,  1.356,  1.782,  2.179,  2.303,  2.681,  3.055,  3.428,  3.930,  4.318  ),  
"13" => array( 0.694,  0.870,  1.079,  1.350,  1.771,  2.160,  2.282,  2.650,  3.012,  3.372,  3.852,  4.221  ),
"14" => array( 0.692,  0.868,  1.076,  1.345,  1.761,  2.145,  2.264,  2.624,  2.977,  3.326,  3.787,  4.140  ),
"15" => array( 0.691,  0.866,  1.074,  1.341,  1.753,  2.131,  2.249,  2.602,  2.947,  3.286,  3.733,  4.073  ),
"16" => array( 0.690,  0.865,  1.071,  1.337,  1.746,  2.120,  2.235,  2.583,  2.921,  3.252,  3.686,  4.015  ),
"17" => array( 0.689,  0.863,  1.069,  1.333,  1.740,  2.110,  2.224,  2.567,  2.898,  3.222,  3.646,  3.965  ),
"18" => array( 0.688,  0.862,  1.067,  1.330,  1.734,  2.101,  2.214,  2.552,  2.878,  3.197,  3.611,  3.922  ),
"19" => array( 0.688,  0.861,  1.066,  1.328,  1.729,  2.093,  2.205,  2.539,  2.861,  3.174,  3.579,  3.883  ),
"20" => array( 0.687,  0.860,  1.064,  1.325,  1.725,  2.086,  2.197,  2.528,  2.845,  3.153,  3.552,  3.850  ),
"21" => array( 0.663,  0.859,  1.063,  1.323,  1.721,  2.080,  2.189,  2.518,  2.831,  3.135,  3.527,  3.819  ),
"22" => array( 0.686,  0.858,  1.061,  1.321,  1.717,  2.074,  2.183,  2.508,  2.819,  3.119,  3.505,  3.792  ),
"23" => array( 0.685,  0.858,  1.060,  1.319,  1.714,  2.069,  2.177,  2.500,  2.807,  3.104,  3.485,  3.768  ),
"24" => array( 0.685,  0.857,  1.059,  1.318,  1.711,  2.064,  2.172,  2.492,  2.797,  3.091,  3.467,  3.745  ),
"25" => array( 0.684,  0.856,  1.058,  1.316,  1.708,  2.060,  2.167,  2.485,  2.787,  3.078,  3.450,  3.725  ),
"26" => array( 0.684,  0.856,  1.058,  1.315,  1.706,  2.056,  2.162,  2.479,  2.779,  3.067,  3.435,  3.707  ),
"27" => array( 0.684,  0.855,  1.057,  1.314,  1.703,  2.052,  2.15,  2.473,  2.771,  3.057,  3.421,  3.690  ),
"28" => array( 0.683,  0.855,  1.056,  1.313,  1.701,  2.048,  2.154,  2.467,  2.763,  3.047,  3.408,  3.674  ),
"29" => array( 0.683,  0.854,  1.055,  1.311,  1.699,  2.045,  2.150,  2.462,  2.756,  3.038,  3.396,  3.659  ),
"30" => array( 0.683,  0.854,  1.055,  1.310,  1.697,  2.042,  2.147,  2.457,  2.750,  3.030,  3.385,  3.646  ),
"40" => array( 0.681,  0.851,  1.050,  1.303,  1.684,  2.021,  2.123,  2.423,  2.704,  2.971,  3.307,  3.551  ),
"50" => array( 0.679,  0.849,  1.047,  1.295,  1.676,  2.009,  2.109,  2.403,  2.678,  2.937,  3.261,  3.496  ),
"60" => array( 0.679,  0.848,  1.045,  1.296,  1.671,  2.000,  2.099,  2.390,  2.660,  2.915,  3.232,  3.460  ),
"80" => array( 0.678,  0.846,  1.043,  1.292,  1.664,  1.990,  2.088,  2.374,  2.639,  2.887,  3.195,  3.416  ),
"100" => array( 0.677,  0.845,  1.042,  1.290,  1.660,  1.984,  2.081,  2.364,  2.626,  2.871,  3.174,  3.390  ),
"1000" => array( 0.675,  0.842,  1.037,  1.282,  1.646,  1.962,  2.056,  2.330,  2.581,  2.813,  3.098,  3.300  ),
"100000000" => array( 0.674,  0.841,  1.036,  1.282,  1.64,  1.960,  2.054,  2.326,  2.576,  2.807,  3.091,  3.291 ),
);
    //maps the degrees of freedom, note for large values this needs to be rounded to the nearest key
    if(array_key_exists($df,$dfs)){
        return $dfs[$df][$pkey];
    }else{
        return 129;
    }
}

Thanks to diewom for finding a bug in the PHP which was not present in my Shell script for calculating Confidence Intervals.

Trackbacks/Pingbacks

  1. Shell Script for calculating Confidence Intervals @ Freeminded.org

Leave a Reply