Regression Coefficients Estimated Incorrectly When Centered Predictors Used
May 23, 2007
to getting accurate regression coefficients from a dataset with *small* (standard) numbers, which contains centered predictors. More specifically, I have a dataset with 18 observed data points containing a criterion (y), a centered predictor variable (x), another centered predictor variable (z), and the interaction of the two centered predictor variables (xz). This multiple regression equation is structured to test for interactions between the two continuous predictor variables (x and z) as prescribed by Aiken and West (1991) in their classic book.
When I run the regression in Excel with the centered predictors, some of the regression coefficents in the output are estimated to be 0, although they are clealry *not* 0 as estimated by SPSS 14.0.2. I have spent many hours troubleshooting this problem (and searched many forums on the internet) and still do not know why this is happening.
Initially, I thought the problem might have to do with the cross-product of the centered predictors, but even just doing a regression with one of the centered predictors (for certain centered predictors) yields a regression coefficient of 0 (although it should be non-zero as per SPSS 14.0.2). When doing these multiple regressions with non-centered predictors, all regression coefficients are estimated accurately.
I was wondering if anyone had any insights on why I am experiencing these problems.... If anyone wants a sample of some test data I have used to troubleshoot these problems, you can download a file from: [url] and/or email me at (email deleted by Mod) for more datasets or questions.
I'm trying to see how accurate people's work predictions are to actual work completed. So I have these formulas:
=IF(C15=0,"",(SUMIF(L$24:IO$24,"Est.",L15:IO15))) - total estimated days =IF(C15=0,"",(IFERROR((SUMIF(L$13:IO$13,"Act.",L15:IO15)),""))) - total actual days
But the problem of course is that people estimate a load of work and only fill in the actual days as they go along, so the accuracy of comparing one to another is almost always misleading.
What I want to do is only count the values in the weeks Estimated if the Actual figure is also there (L24:IO24), which is always the cell directly on its right.
I have a VBA function that calculates polynomial coefficients for a series of data pairs. One selects the range of cells that the coefficients are to be stored in, and enters the polynomial formula:
{POLFIT(Xa, Ya, N)}
Where Xa is the array of ordinate values, Ya is the array of data values, and N is the polynomial order to be fit.
It is obvious that one needs to select at least N+1 cells when the array funtion is typed in. But, it is easy to select too few cells.
I am looking for a way to test whether enough cells were selected for the range formula: The function declaration is
Function POLFIT(Xa, Ya, N As Integer) As Variant
Various means I have tried to count POLFIT do not return the correct value.
I'm trying to do a binomial distribution summation as part of a VBA function, and have been using the Application.WorksheetFunction.BinomDist function, which works fine until the numbers get large - the binomial coefficients used in the calculation end up being larger than can be held in a double floating-point number in VBA, so excel can't handle it. The final result of the calculation is a probability, so it's not a huge number! I was wondering if anyone knows an alternate way of calculating binomial probabilities which avoid any huge number intermediates.
i have a function in a cell (that works) to extract coefficients from a range of cells in a workbook:
VB: =INDEX(LINEST(CP25:CP27,CQ25:CQ27^{1,2}),1)}
i have variables for cp25:cp27 and cq25:cq27 already defined in my vba code. the values for these in the case i am working on are as follows (returns 110.5):
1) how to do this function in VBA only - this is part of a UDF and cannot have any helper cells 2) how to refer to 560,570,580 as a 'range'. is there a way to put these six variables into my ranges for later processing?
All of the google searches i have deal only with linear regression, taking from existing graphs, or say to just use the function i have above.
I have tried
VB:
Var = Application.WorksheetFunction.LinEst(Sheets("references").Range("CP25:CP27"), Sheets("references").Range("CQ25:CQ27^{1,2}"), 1) [COLOR=#333333][/COLOR]
but return #value! errors. when i remove the ^{1,2} portion, i do return a value but it is incorrect (returns 160), what is the correct syntax for adding in the ^{1,2}? if you do that would be fantastic, but brings me back to issue #2 in that i need to refer to my variables in the vba code and not this range (as they will eventually be going away).
I have a several tabs that are each named the abbreviation for an element (i.e. Al, Sb, etc.) and I am trying to write a formula to display the full element name based on the name of the tab and a table in another sheet. I have written the formula below which works when I enter the formula and press enter on each sheet, but when I click "Calculate Now" to run the calculations for the whole file Excel will return the name of whichever element I last calculated manually (click in the formula and press enter) on every sheet. Why does Excel calculate correctly when I press enter but then change it when I calculate the whole file?
If I enter 1 in a cell, 1.1 below, select both and drag down, I should get a vector: 1, 1.1, 1.2 etc.
However, in the most recent instance of seeing this problem, at 6.5 I get 6.50000000000001! 6.6 onwards is then correct, at 7.2 the value is again incorrect in the 14th decimal place, and the errors continue intermittently through the sequence.
Is there a fix for this? It's extremely time consuming to have to check each auto-incremented array like this.
This is a two part question. I thank any help that can be given in regards to my problem. I have attached a spreadsheet similiar to what is used at work. We gather this information from a report we use.
Part one- in Column F, we have it set up to know how long the customer is deliquent. Column F is the difference between the date in column D and the date that is in cell E2. We are using the formula networkdays which will not count the saturday and sundays of the weeks.
The problem is, that sometimes when we place the information from the report, the value in column F is off a day, which causes us to have to adjust the formula so the information is the same. Why do we have to adjust the formula?
Part two - In the networkday formula, we do not want to include certain holidays, which is listed in column J. Is there a way to have the dates for the holidays auto advance if the dates in column J is less than the dates in column D?
I've recorded a macro in which I unhide certain columns, copy and paste some information then hide those columns again. The problem is that when the macro is finished, it incorrectly hides columns K to AN. I did not record that and it's not in the code so I'm lost as to why it's happening.
Why it's hiding everything from K to AN. I've tried recording the macro several times but it's just not working, no matter the order in which I hide columns when recording it.
In the attached spreadsheet i have a budget amount, billed to date, %complete, %remaining and forecast figure. What i am trying to do is estimate the forecast spend vs the budget or billed to date and percent remaining. I am struggling with how to do this based on in some cases the budget is already overspent but the %complete is less than 100%. What i really want to do is create a forecast based on the billed to date or budget depending on which is greater and work out estimated spend based on whether the task is complete or there is still a % remaining.
I entered exactly 113,876.92 in cell L16 I entered exactly 113,390.02 in cell L17 I entered =L16-L17 in cell L18 L18 incorrectly shows the result at 486.9000000000009000 (note the extra "9" after the 11 zeros). When I expand the viewable digits on L16 and L17, they have ALL zeros after the cents. (I went out at least 25 digits). I can't be the first one encountering this.
The above is a screenshot of the data analysis (regression) I want to automate with vba code. Like all macros, I tried to record first and only got the following
I was handed the attached file. understand everything except how the values in row 6 were derived No formula present when I received the file, just the numbers. Row 7 is hard entered scores the units achieved
I have the following dataset and was wondering how I can run a constrained regression in Excel with the constraint being that the total allocation of assets is 100%:
Total return (y): 12 data points Asset 1 (x1): 12 data points Asset 2 (x2): 12 data points Asset 3 (x3): 12 data points Asset 4 (x4): 12 data points Asset 5 (x5): 12 data points Asset 6 (x6): 12 data points
[Attached is a spreadsheet with the actual dataset]
I know the regression equation I need is R = b1X1 + b2X2 + ….+ (1 - b1 - b2 -….- b5 )X6 + e
I am wanting to write a macro which uses the excel multiple regression function (a part of the data analysis add-in). I tried recording a macro while I selected the regression function (Tools> Data Analysis... etc.) which produced the following:
I would like to run a multiple linear regression in vba. I have one dependent and three explanatory variables. I will have to use a macro of some kind, since I need to run too many regressions to do it manually. To simplify things a little bit:
- There will always be exactly three independent variables
- There are no missing values
- The data is allways numerical
I've already got four ranges defined: Yrange, X1range, X2range, X3range. I would like to take these ranges as input parameters for the regression model. The only two parameters I need are Sum Square for Regression (SSR) and the degrees of freedom. I understand that you can use excel's matrix formulas to calculate some of the input parameters, but one doesn't really get around vba. Any (simple) source code allowing me to conduct a regression with three input parameters?
I am trying to set up the formula y = ax^2 + bx + c. Is there a function for that in excel?
To get a little more into my overall goal. I will have a x constant that will remain the same, but I have 8 different sets of a,b, & c coefficients. So, I would like to set up something like, if a row is labeled A1, find the A1 set of coefficients and use them in the quadratic equation. I was thinking I would need to use a CSE, is that correct?
I have one question about the regression. i know how can i do it lineair, exponentiel ... but how can i make a personnalised regression ? for example, if i have i function f(x)=a+b*x^2 or another function ... how can i introduce my function to draw the regression ? i have excel 2007
I have set up a linear regression array in Excel and now want to test the significance of my r2 value at a certain level of significance. I've only been able to find tables that give the critical r value, but I want to test it at 99.73% level of confidence and none of them contain that specific value. Is there a way to do this in Excel?
I have a set of data. I know how to do linear regression over the whole set of data. How do I have another linear regression over the first 5 points in the set of data on the same graph ?? I am using Excel 2007
I'm trying to write a macro that will analyze data from one spreadsheet and do a regression. The information I want to be output on the same sheet. I tried to use the record function, but I got an error. It said "Run-time error '1004': ATPVBAEN.XLA could not be found. The code read:
However I have a subject at uni that requires me to create a series of regression models, histograms, correlation matrices etc.
For part of the assignment, i have to run 4 regressions (one for men & one for women) with dependent variable as average wages, and independent as bfast 1,2,3 and dinner 1,2,3 (all of which are dummy variables) (0 for male and 1 for female).
The second two regressions are exactly the same, except average wages must be transformed into log which i have already done.
I dont know how to differentiate the regression models into male and female (if possible). We must also include residuals and residual plots.
I keep getting an error that says non-numeric data, the other says input range must be a contiguous reference.
I have the macro below which opens csv files stored in a user selected folder and processes them changing the date format in column D from DD/MM/YYYY to text stored as YYYY-MM-DD.
For most of the dates the code works without issue, but for some (those with a month <12 possibly) it transposes the MM and DD incorrectly.
I understand that when opening the CSV's in excel it automatically converts the dates to DD/MM/YYYY, so I'm actually opening in wordpad which displays as YYYY-MM-DD, with only a portion being incorrect.
I've attached a couple of sample files (pre and post conversion).