SECTION 4: TASK 2- DATA COLLECTION
2.1 Data Collection Strategy
The objective is to measure how much local economic activity has been increased by the event. That is, compare the value of sales (or income) during the period of the event to the value of sales (or income) which would have occurred without the event. Since the latter magnitude is not observable this usually leads to the use of surveys.
Two basic types of surveys have been employed. Most widely used is the consumer, (event visitor) survey. Here, questionnaires are used to discover incremental spending in the region directly connected to the event. This spending comes from visitors to the region whose appearance has been motivated by the event. By asking this group directly how much they have spent, we can obtain an estimate of the direct economic impact of the event.
The second basic type of survey is the business sector survey. Here the focus is on area firms and by how much sales changed owing to the event. For example, sales revenues during the event are compared to revenues for a period preceding or following the event. Comparing sales for years prior to a event to same period sales for years when the event was in existence, is another way to estimate sales resulting from the event.
While surveys of visitors and local firms have been employed in impact analysis, they both have deficiencies. Firstly, not all people attending events are always willing or able to provide accurate spending estimates. Careful attention to survey design and sampling techniques can reduce this problem. Secondly, most merchants are simply not willing to provide sales data. Furthermore, how comparable are sales data prior to/after the event? Are there seasonal variation in sales or other events/attractions in the community, or in surrounding communities that could influence sales during these periods? These are just a few of the questions that arise in a "what would have happened" scenario. Previous efforts have shown the desirability of obtaining spending data via a survey of consumers (visitors) as opposed to sellers (firms). If data collection costs are not an issue, surveying both visitors and local firms is desirable since each serves as a cross-check of the accuracy of the other (expenditures by visitors should equal incremental local business receipts).
2.2 Data Collection Instrument(s)
Survey Tools
Diaries
There are a number of survey tools that can be used to gather data from visitors. Some of these tools are discussed briefly in what follows.
A diary format for continual recording of spending is the most accurate method of obtaining expenditures. Diaries require minimum recall on the part of respondents since a running record is kept of spending, which is advantageous if highly detailed information is sought. However, there are two major shortcomings:
- Having to record expenses may change the tourist's spending habits, which biases total tourist spending estimates;
- The response rate (getting people to participate) tends to be low since few vacationers would welcome an additional chore (innovative incentives might offset this problem).
It is reasonable to suspect however that certain personality types are more likely to keep diaries. Others are more likely to refuse. Their personality differences might also be reflected in spending behaviour.
Exit Interviews
A second method of gathering expenditure information is to interview people as they leave the area. In an "exit interview" people are asked to estimate either their total expenditures for the entire period spent in an area or for their last day. Since fewer people will refuse to complete an exit interview, you can expect to find a more representative group of respondents for this type of interview than for diaries. However, people will tend to forget the details of their expenditures. Even when you ask them to consider the entire time period spent in your area, they will remember better the expenses from the last day than from earlier days.
If persons are interviewed at the end of their trips, accuracy can be reduced as people are forced to remember what they had spent on previous days (recall error). If persons are interviewed prior to the end of their visit, they are forced to project expenditures they will make during the rest of their visit.
Mail Surveys Completed at Home
A third alternative is to send questionnaires to the visitor at their home. Mailing addresses can be obtained via to a random sample of registration data, or from random intercepts of visitors (where the interviewer obtains the respondent's permission to send them a survey and their mailing address).
This type of questionnaire typically gets a higher response rate than the diary, but a lower response than exit interviews. Also, the time lapse increases the tendency to underestimate actual expenses.
This approach can be used in conjunction with the personal interview approach, but where the interview includes only basic information and the distribution of surveys to be completed at home and returned by mail.
Telephone Surveys
A fourth alternative is to obtain telephone numbers (along with permission to call respondents upon completion of their trip) and complete telephone interviews soon after their visit. This type can show improved response rates over mail surveys but can be expensive. Recall error is a problem with this technique.
Important Note
Listed below are a few critical elements that should be contained in the data collection instrument. The details and rationale for this information is contained in the next section on data collection. The data collection instrument should:
- Identify purpose of trip. Breakdown expenditure by type: accommodation/restaurant meals and beverages, etc.
- Geographical location of expenditures (inside/outside study area).
- Breakdown visitors by type: participant/accompanying participant/spectator, use/not use accommodations, type of accommodations, length of stay, etc.
- Identify the party for whom the expenditures refer (single person respondent/those traveling/staying together, etc.
2.3 Data Collection Strategy
The Necessity of Surveys
Spending figures for event visitors are not known and therefore require estimation. As mentioned previously (under Data Collected Strategy) this requires a survey of visitors. For events that have a small number of participants and that are held in a concentrated geographical area, surveying all event related visitors may be feasible. If this is the case, summarising visitor expenditures across all the surveys provides an estimate of direct spending by visitors.
In most cases a survey of all visitors is not feasible, cost effective or necessary. This is because obtaining expenditure for a sample of event visitors (carefully chosen and of sufficient size) can allow accurate estimation of spending by all visitors.
Sampling Theory
The process of using the characteristics of a sample to estimate the characteristics of the group as a whole (population) is known as statistical inference. The validity of such inference is based on a critical assumption. That assumption is that the characteristics (spending behaviour, in economic impact analysis) of the sample of visitors, is representative of the spending behaviour of the population of visitors (including those who were not surveyed). If this assumption is not valid, a source of error has been introduced into estimating total spending by this group of visitors based on the spending patterns of the sample. For instance, if the sample produces an average expenditure figure that is significantly greater than the actual (unknown) average for the entire group, then inferring the sample mean onto the population will significantly over estimate total spending.
It is important to keep in mind that a sample is not likely to produce an estimate of average expenditure that is exactly equal to the average expenditure of the entire group. Even if the sample closely reflects the population, some difference (known as sampling error) would be expected to exist. Given this, the goal is to produce an estimate of average expenditures from the sample that is within an acceptable range (known as the confidence interval) of the population expenditure mean. The confidence interval is usually expressed as plus or minus so many dollars around the average expenditure estimate. For example, average expenditure is estimated to be $100, plus or minus $10.
Appropriate sampling techniques are designed to increase the probability that a sample is an accurate reflection of the population (group as a whole) as well as produce statistics (mean expenditure for example) that is within an acceptable range of the true (unknown) mean. It is important to note however, that there is always a chance that the sample selected will not accurately reflect the characteristics of the entire group from which it was selected. In other words, despite satisfying all underlying assumptions and employing proper sampling techniques, the sample of people selected turned out to be significantly different than the population (i.e. you just happened to be unlucky and get a bad sample). The degree of confidence that the sample does accurately reflect of the population is known as the confidence level.
When inferring the characteristics of a sample onto the entire population from which the sample was selected, there are two key issues, namely:
1. Sample size.
2. Obtaining a representative sample.
These issues will be discussed in what follows.
Sample Size
Collecting valid information through sampling requires careful planning, including determination of appropriate sample size. Suppose a recreation practitioner desires an interval estimate of average daily expenditures for visitors to a slow-pitch tournament. An investigator reports that average daily expenditures are between $20 and $100 with 90 percent confidence (confidence levels will be discussed later). The recreation practitioner replies "twenty to one hundred dollars - I could have guessed that!" The point being that the interval (range) must be small enough to be useful. In taking a sample, the sample size must incorporate the informational needs in terms of the maximum (acceptable) error in estimation and the confidence level.
Confidence levels are expressed in percentage terms, the most common being 90%; 95% and 99%. The confidence level simply reflects the proportion of sample means that are included within Z standard deviations of the mean in a sampling distribution.
In other words, a 90% confidence level says that an interval of ± 1.64 standard deviations around the mean will contain the actual (but unknown) mean 18 times out of 20 (90%). Similarly, a 95% confidence level says that a confidence interval of ± 1.96 standard deviation around the sample mean will contain the actual mean 19 out of 20 (95%).
The key point here is that the wider the confidence interval, the greater the confidence level that the actual falls within this interval. However, the wider the confidence interval, the less precision associated with our sample mean (the larger the maximum acceptable error in estimation).
A formula for calculating the sample size (
) required to estimate
, the sample mean (average) with a maximum error in estimation (acceptable range) and confidence level (proportion of sample means that are included within a given range) in a population with a given standard deviation s is shown below:

where
Z = represents standard deviations from the mean associated various confidence levels.
99% confidence level Z = 2.58
95% confidence level Z = 1.96
90% confidence level Z = 1.64
= population standard deviation.
This figure will not be known so estimate it by using one of the following:
- Make use of previous, completed comparable study data that calculated the standard deviation.
- Do a pilot (10 - 20) sample and use the resulting standard deviation.
- Make use of previously completed comparable study data by taking the range of values (difference between highest and lowest) and divide by four.
- Estimate the expected range of values and divide by four.
-
= maximum error in estimation. This is the acceptable margin of error associated with the mean. (
is the population mean which is unknown until the sample is taken, m is the unknown population mean, that is the actual mean including all those who were not sampled).
The maximum error in estimation is chosen by the researcher, for example, if I estimate that average daily expenditures are $75 per day, I may be willing to accept an error of ±7.50 (between $67.50 and $82.50) which says the acceptable error is plus or minus 10% of the mean value.
Example
You want to calculate the required sample size associated with daily visitor expenditures to an adult hockey tournament. You select a 95% confidence level. Previous studies indicated an average daily expenditure of $50 with a standard deviation of $75. You are willing to accept a maximum error in estimation of ± $10 (± 20% of the mean).

You would need a sample size of 216 responses. Note that if you decided that the maximum acceptable error in estimation was $5, you would need a sample size of over 800.
The investigator has control over two of three quantities that influence sample size. The variability of items in a population is inherent; therefore, the investigator cannot exercise control over the value of
. But the maximum error in estimation and the confidence level can to some degree be controlled by the researcher. Reducing the maximum error in estimation, while holding the confidence level constant requires increasing sample size. Also, increasing the confidence level, while holding the maximum error in estimation constant requires increasing sample size.
A serious statistical error is underestimating the sample size necessary to provide reliable estimates. Conversely, overestimation of a sample size is a problem (not in terms of the reliability of the estimate) but that it brings extra costs of producing, administering and analyzing questionnaires without increasing the reliability of the estimates by a significant degree. Sometimes the only resort is to obtain as many responses as possible within a reasonable budget and report the reliability of the estimate. The reliability is expressed in terms of the standard error of the estimate, i.e. ± so many dollars. This amounts to solving for maximum error in estimation from the given formula since
is fixed by the number of surveys actually collected.
A Representative Sample
As mentioned previously, statistical inference requires that the characteristics of the sample are representative of the population from which that sample was taken. For example, suppose we want to identify average expenditures by non-Wolfville residents at Acadia home football games. Since we don't need to sample all home games, the question arises as to which are representative. Perhaps big spenders only come to watch the intense rivalries, or only when its sunny and warm, or only to watch wining teams.
To get a sample that is representative of the population of all non-Wolfville residents attending the games (unbiased), one must take samples at games where there are teams of various quality in good weather and bad. Care must also be taken in assigning survey personnel to different areas of the stadium. If all the surveys were taken from people sitting in the most expensive seats, then certainly the results would be biased.
A representative sample takes careful consideration. It is inadvisable to have a sample which is too small but wasteful to have one that is too large. Just as important as the number of surveys completed, are efforts aimed at ensuring that the information given in the sample is representative of the group as a whole. In surveying event visitors, relavant variables to consider in insuring a representative sample include; weekday versus weekend; distance travelled; use of overnight accommodation; type of accommodation; weather; age; gender; with/without family, etc.
If the sample chosen is truly random, (every respondent has an equal probability selected) and of sufficient size, then the proportion of visitors of different types (and related differences in expenditures) identified in the sample can be inferred onto the group of visitors as a whole. In other words, if our sample indicated that 20% of repondents require overnight accommodation,we can infer (with a certain degree of confidence) that 20% (plus/minus the confidence interval expressed in percent i.e. ± 2%) of the population of visitors require accommodation.
In surveying event visitors, it can often be useful to use past experience or registration data to identify visitor types prior to conducting the survey and then to adjust our survey efforts to ensure that a sufficient number of each type is sampled. This is known as stratified quota sampling or segmented analysis.
For a segmented analysis, visitor surveys are stratified by segment (type). The most efficient sampling design for estimating spending will apportion sample sizes across segments according to the size of the segment and the expected variation in spending within each segment. Those segments showing larger variation in expenditures will require larger sample sizes to ensure reliable estimates. There is, however, a tradeoff here if same survey is used to estimate the proportion of visitors falling within each segment. Simple random samples are often needed to estimate segment shares, while disproportionate sampling across segments is often called for to efficiently estimate spending. A common situation is large numbers of visitors with low spending (day visitors) and small numbers of visitors (staying over night) with relatively large spending. A simple random sample yields good estimates of day users, but doesn't contain enough overnight visitors to adequately estimate their spending. If one targets overnight visitors for example, by sampling in motels and campgrounds, there may be no way to estimate the proportion of visitors who stay overnight.
The solution is usually to employ distinct sampling designs to estimate average spending vs. segment shares. If segment shares can be estimated from secondary sources, then the spending survey can be designed just to estimate segment spending profiles. If not, one can use a simple random sample to estimate segment shares and after screening for segment, gather spending information only from a sub-sample of each segment according to a quota system.
It can be problematic to obtain adequate sample sizes for each segment to provide reliable spending estimates across all segments. This problem depends upon the degree of segmentation (number of visitor types) and the total number of surveys completed. Reducing the number of segments (visitor types) can effectively increase the number of survey responses for each segment and increase the reliability of estimates for each segment.