*. */=======================================================. *. */ Cleaning of ERC Data from ISR Web site. */ Send questions to Kelly Shaver, ShaverK@cofc.edu. */ Copyright Kelly G. Shaver 2001-2006. Please cite. :-) */ Revised July 2006. */=======================================================. *. */ NOTE: THE DATASET IS A MOVING TARGET. */ To be safe, download and unzip the ERCW14Q zip file. The filename stands */ for Entrepreneurship Research Consortium Wave 1-4 Questionnaire-based, */ so it is organized by questionnaire, not by sequence. */ Once the data file is downloaded and unzipped, save it "as" some other SPSS */ data file name that includes the date of the download, and then do analysis. */ */ This procedure allows one to indicate the "version" that has been analyzed. */ Some of the present syntax is designed to correct errors in the web- */ based file. As errors have been identified, I have brought them to the attention */ of the ISR staff. With subsequent versions of the dataset, some of the */ corrections may not be needed. Consequently, this syntax file begins by */ identifying the presence of errors and patching each error in turn. */ The overall strategy is to use crosstabs or frequencies to identify the RESPID */ numbers of problem cases. If a particular crosstab does not produce the */ problem cases indicated, then the patch associated with that crosstab */ does not need to be done. *. */=======================================================. *. */ Check for correct distribution of respondent sex by counting NCGENDER. */ There are several items that represent the sex of the respondent, but */ NCGENDER is the only one that has been verified by a detailed */ examination of the data by Nancy Carter (hence, NC gender). */ A crosstab of NCGENDER by RTYPE on the entire 1261 should produce */ 275, 171; 52, 171; 100, 61; 104, 119; 88, 120.) If this distribution does */ not obtain, then the NCGENDER variable has become corrupted in the */ file downloaded. Any classifications involving gender will be wrong. *. */=======================================================. *. CROSSTABS /TABLES=ncgender BY rtype /FORMAT= AVALUE TABLES /CELLS= COUNT . *. */=======================================================. *. */ Eliminate 6 infant businesses who should have been screened out. */ (Positive cash flow including owner salary for more than 3 months.) */ The cash flow variable is cfphlag (for Cash Flow PHone LAG). */ Begin by showing the distribution of lags (in days) across the 25 */ respondents who have cash flow of some level. Eliminating these */ 6 infant businesses reduces the sample to 1255. */. */=======================================================. *. freq var=cfphlag. FILTER OFF. USE ALL. SELECT IF(sysmis(cfphlag=1) or (cfphlag < 90)). EXECUTE . *. */=======================================================. *. */ Respondents report what percentage of their intended business */ will be owned by themselves, other persons on the team, and */ "nonpersons." An intended business that will be owned more than */ 50% by a nonperson is an enterprise that the respondent cannot */ control. Thus, many people will want to eliminate the intended */ businesses that cannot be controlled by the founding entrepreneur. *. */=======================================================. *. */ Determine how much of the enterprise is to be owned by participants */ who are not individual persons. NPOWNPC (line 161) was */ created on the basis of Q217 (who will own?) answered as */ "not a person" and percentage of ownership (Q207C). This */ variable identifies 18 people out of the total of 830 nascents, */ who expect that non-persons will own some percentage of */ the business. *. FREQ npownpc. */ Of the 18, 7 show an expected non-person ownership greater than */ 50% (one at 66%, 1 at 82%, 1 at 85%, four at 100%). Delete these */ cases, thus eliminating RESPIDs 328100020, 328100183, 328100255, */ 328100267, 328100443, 328100572, and 337800154. */ This reduces the sample to 1248. *. FILTER OFF. USE ALL. SELECT IF(sysmis(npownpc=1) or (npownpc LE 50)). EXECUTE . *. *=======================================================. *. */ Remove from the Comparison Groups all respondents who are */ starting busineses of their own. *. *=======================================================. *. */ Minority oversample Comparison Group participants, who are */ RTYPE 21, were asked the screening questions about start-up */ activities. Any who answered affirmatively to the question about */ start-up involvement, SUINVOL (line 75) should be deleted from */ the Comparison Group. This represents a total of 28 people, 14 */ females and 14 males, leaving an overall total of 1220. *. DO IF (RTYPE = 21) . SELECT IF (SUINVOL = 1). END IF . EXECUTE. *. */ Respondents targeted for the ERC Mixed Gender and NSF Women */ comparison group were not asked about their start-up involvement. In the */ one-year follow-up, these respondents (all of whom are RTYPE 20) */ were asked about their start-up activities. (They should have had none.) */ The variable representing involvement is CGSUACT (line 155). */ This variable identifies four individuals who should be removed below. */ (The four RESPID numbers are 328200046, 328200059, 328200084, */ and 328200115. This reduces the overall total to 1216. *. DO IF (RTYPE = 20) . SELECT IF (CGSUACT NE 1). END IF . EXECUTE. *. *. *=======================================================. *. */ Correct problems in the variable, AUTONSU, that is the combination */ of financial independence and nature of business. *. *=======================================================. *. */ NOTE: As of July 2006 in the questionnaire-ordered dataset the */ variable AUTONSU (line 158) has two separate problems. First, */ the normal code for MISSING (which is usually 99) has been */ labeled "COMPARISON GROUP". So a frequency count of the */ the variable shows 399 values to be missing, although they are */ in fact members of the comparison group. To remove this source */ of confusion, the code for missing must be changed */ to 999, leaving the value of 99 to represent the CG. *. freq autonsu. MISSING VALUES AUTONSU (999). EXECUTE. VALUE LABELS AUTONSU 1 'no outside influence' 2 'LT 50% non persons, indep su' 3 'LT 50% non persons, frch mlm' 4 'LT 50% non persons, bus spon' 5 'GE 50% non person ownership' 99 'Comparison group' 999 'missing'. execute. freq autonsu. *. */ The second problem is that the frequency count for AUTONSU */ shows 716 people engaged in autonomous start-up. (This error */ is also present in the descriptions of cases in Appendix C of the */ Handbook of Entrepreneurial Dynamics.) A crosstab of AUTONSU */ with AUTONSU4 (line 13) illustrates the problem. *. CROSSTABS /TABLES= AUTONSU BY AUTONSU4 /FORMAT= AVALUE TABLES /CELLS= COUNT . *. */ In this crosstab the column totals, which represent AUTONSU4, */ are correct (715, 102, 399). (Remember that start-ups with more */ than 50% nonperson ownership -- the fourth category of AUTONSU4 -- */ were deleted above.) The RESPID in error is 337800099. */ The correct value, determined by comparison to the frequencies */ in Q190 (line 283) is a score of 3. Correct the value for this person. *. IF (RESPID = 337800099) AUTONSU = 3. EXECUTE. *. CROSSTABS /TABLES= AUTONSU BY AUTONSU4 /FORMAT= AVALUE TABLES /CELLS= COUNT . *. *. *=======================================================. *. */ Create a second, categorical variable, AUTONSU3 to separate fully */ autonomous from partially autonomous from comparison group. *. *=======================================================. *. */ Elimination of respondents whose businesses would be more than 50% */ owned by a non-person has, as shown above, eliminated one of the four */ categories of AUTONSU4. To avoid confusion it is now better to create */ a THREE category variable, AUTONSU3, to represent major groupings */ of respondents. Fully autonomous start-ups will have a value of 100, */ partially autonomous start-ups will have a value of 200, and the */ comparison group will have a value of 300. *. COMPUTE AUTONSU3 = AUTONSU. EXECUTE. RECODE AUTONSU3 ( 1= 100)(2,3,4,5 = 200)(99=300). VARIABLE LABEL AUTONSU3 'Three category classification - full, partial, CG'. VALUE LABEL AUTONSU3 100 'Fully autonomous' 200 'Partially autonomous' 300 'Comparison Group'. EXECUTE. *. FREQ AUTONSU3. *. *. *=======================================================. *. */ Adjust initial Wave 1 weights by AUTONSU3 categories. *. *. *=======================================================. *. */ NOTE: Statistical procedures should be performed only on weighted data. */ Weights should be constructed so that within "cells" of a statistical design */ the weights sum to the number of individuals in the cell. If the only divisions */ in a research design are based on degree of autonomy, then weights in the */ Fully Autonomous cell should sum to 715, those in the Partially Autonomous */ cell should sum to 102, and those in the Comparison Group should sum */ to 399. This is what is accomplished below. Researchers who desire also */ to examine respondents separately by RTYPE, or by sex, or by some */ combination of the three will need to normalize the weights for their */ particular purposes, using the logic below as a model. */ In Wave 1 the weight for nascents is WTW1, but for CG it is WTCG. */ In order to use the same named variable as a weight for both, a new */ Wave 1 weight must be created, called WAVE1WT. *. compute wave1wt = 99. execute. variable label wave1wt 'Common weight for NE and CG in Wave 1'. missing values wave1wt (99). execute. if (RTYPE le 12) wave1wt = wtw1. if (RTYPE ge 20) wave1wt = wtcg. execute. *. */ Step 1. Determine the actual totals of weights for each of the three */ autonomy categories by successive filtering of the data to be used. *. USE ALL. COMPUTE filter_$=(AUTONSU3 = 100). VARIABLE LABEL filter_$ 'AUTONSU3 = 100 (FILTER)'. VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'. FORMAT filter_$ (f1.0). FILTER BY filter_$. EXECUTE . *. DESCRIPTIVES VARIABLES=wave1wt /STATISTICS=MEAN SUM STDDEV MIN MAX . *. USE ALL. COMPUTE filter_$=(AUTONSU3 = 200). VARIABLE LABEL filter_$ 'AUTONSU3 = 200 (FILTER)'. VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'. FORMAT filter_$ (f1.0). FILTER BY filter_$. EXECUTE . *. DESCRIPTIVES VARIABLES=wave1wt /STATISTICS=MEAN SUM STDDEV MIN MAX . *. USE ALL. COMPUTE filter_$=(AUTONSU3 = 300). VARIABLE LABEL filter_$ 'AUTONSU3 = 300 (FILTER)'. VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'. FORMAT filter_$ (f1.0). FILTER BY filter_$. EXECUTE . *. DESCRIPTIVES VARIABLES=wave1wt /STATISTICS=MEAN SUM STDDEV MIN MAX . USE ALL. *. */ Step 2. Create corrected weight AUTWT by multiplying WAVE1WT */ weight by a fraction that is: */ (number of individuals in each AUTONSU3 category) */ divided by */ (actual sum of WAVE1WT for each AUTONSU3 category). *. COMPUTE AUTWT = 999. VARIABLE LABEL AUTWT 'Wave 1 weight standardized by R category for 1216 total'. EXECUTE. IF (AUTONSU3 = 100) AUTWT = wave1wt*(715/710.53). IF (AUTONSU3 = 200) AUTWT = wave1wt*(102/104.09). IF (AUTONSU3 = 300) AUTWT = wave1wt*(399/407.20). MISSING VALUES AUTWT (999). EXECUTE. *. */ Step 3. Compute the sums for AUTWT by AUTONSU3 category, to */ ensure that the sum of the weights per category now equals the */ number of respondents in that category. *. USE ALL. COMPUTE filter_$=(AUTONSU3 = 100). VARIABLE LABEL filter_$ 'AUTONSU3 = 100 (FILTER)'. VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'. FORMAT filter_$ (f1.0). FILTER BY filter_$. EXECUTE . *. DESCRIPTIVES VARIABLES=AUTWT /STATISTICS=MEAN SUM STDDEV MIN MAX . *. USE ALL. COMPUTE filter_$=(AUTONSU3 = 200). VARIABLE LABEL filter_$ 'AUTONSU3 = 200 (FILTER)'. VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'. FORMAT filter_$ (f1.0). FILTER BY filter_$. EXECUTE . *. DESCRIPTIVES VARIABLES=AUTWT /STATISTICS=MEAN SUM STDDEV MIN MAX . *. USE ALL. COMPUTE filter_$=(AUTONSU3 = 300). VARIABLE LABEL filter_$ 'AUTONSU3 = 300 (FILTER)'. VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'. FORMAT filter_$ (f1.0). FILTER BY filter_$. EXECUTE . *. DESCRIPTIVES VARIABLES=AUTWT /STATISTICS=MEAN SUM STDDEV MIN MAX . USE ALL. FILTER OFF. *. */ This ends the cleaning of the data files. Analyze to your heart's content. */ But remember to recenter the weights for any cuts in the data (such as */ by sex -- NCGENDER -- as well as by nascent category.