The Problem Of Sampling In Various Sciences

Alvin C. Roseyard

Oct 1, 1999

Conducting a survey is basically an act of investigating the behavior, opinions, characteristics, and other elements of a group of entities usually by questioning, analyzing, or observing them. This definition is broader than what we usually mean by "survey," for it allows the surveying of a non-human entity. This could be a particular product, an animal, a plant, or even a remnant from an ancient civilization. Furthermore, in this context "survey" means any type of research that includes sampling. A "sample" is defined as a preselected group of items taken from a larger set of items (a "population"), and the results of a survey depend upon the research on the sample. For instance, if a researcher wants to find the average IQ level of American teenagers, the population in question is all American teenagers, and the sample is a preselected number of American teenagers whose IQ levels actually will be measured.

As mentioned above, a survey seeks to make inferences about a population based on information obtained from the sample. This objective is related to cost and time constraints, because, as in our example of American teenagers, it would be very costly to measure all of their IQs. Instead, a representative subset (a "random sample") of teenagers is taken, and then their IQs are measured. In some situations, the entire population might be surveyed, as in a census or when dealing with small populations. Even if the population is small, the test to obtain particular information from an element might require that element's destruction. For example, to determine one's blood cholesterol level, a blood sample consisting of a few milliliters of blood, not all of it, should be taken. If we want to determine the average lifespan of light bulbs manufactured on an assembly line, we should take a reasonably sized sample, such as a few hundred, and test them to see how long they last. Otherwise the company would go bankrupt.

The basic problem of a survey is the validity and reliability of its results. The solution lies in the three segments of conducting a survey: planning, data collection, and analysis and reporting. Usually the public sees the reporting segment. However, since the public does not know how the survey was conducted, it is prone to be misled by the reported results. To determine a survey's reliability, one must know the sources of bias affecting its outcome. These biases can be traced to the interviewer or the researcher, the format of the questionnaire or the experiment, the availability of information, and other causes.

From a researcher's point of view, these source biases must be kept as small as possible. The most serious bias problems arise from the questionnaire type and the sampling methods used. For example, if the questionnaire refers to a socially desirable situation, respondents tend to answer in accordance with the social desirability. Say people are asked whether they read the front page or the sports page of the newspaper first. Many will answer that they read the front page first, even if they really read the sports page first, because reading the front page first makes one look more sophisticated. Hence it is a socially desirable attribute.

A more interesting example is provided by a survey done by the American Society of Microbiology. Its researchers wanted to determine the percentage of people who wash their hands after using public restrooms. When they surveyed a randomly selected sample in the Washington, DC, area over the phone, 94 percent said that they washed their hands afterwards. However, researchers who observed 6,333 people using public restrooms in five major American cities found that only 68 percent did so. Here, the socially desirable situation is, of course, washing one's hands after using public restrooms.

A researcher also has to be very careful when interpreting the results of a question seeking potentially incriminating or embarrassing (i.e., socially undesirable) information. Suppose we distribute to the people of a particular town a questionnaire asking whether they have used marihuana during the past 12 months. The responses will give a very low estimate of the exact percentage of marihuana users, because drug usage is a serious crime in this country. Fortunately there is an interesting solution to this problem: Randomized Response Technique (RRT).

This is how it works. Suppose that the interviewer presents a 6-sided die to the respondents and gives them a paper that contains the following instructions: Roll the die first, but do not show the outcome to the interviewer. Then:

i) If it shows 1 or 2, answer YES regardless;

ii) If it shows 3 or 4, answer NO regardless;

iii) If it shows 5 or 6, answer truthfully.

Give your answer in the following boxes:

[]Yes []No

By using this clever method, one can estimate the exact percentage more accurately, because there is no way to match the answers to the respondents. Here's how it can be done: Say we surveyed 1,500 people and received 700 "yes" answers. First, the probability of the die showing 1 or 2 is 2/6 (or 1/3). The odds for the die showing 3 or 4, as well as 5 or 6, are the same. Hence we expect that the die will show 1 or 2 in 500 respondents, 3 or 4 in 500 respondents, and 5 or 6 in 500 respondents. According to the directions, we expect 500 respondents to answer "yes" and another 500 respondents to answer "no" regardless of the truth. If the total "yes" responses is 700, then the number of "truthful yes" responses is 700-500=200. Now, we see that 200 out of 500 answered "yes" when the questionnaire asked for a truthful answer. This result gives 40 percent (=200*100/500) as the estimated percentage of actual marihuana users.

Another source of serious bias stems from the sampling plans and methods employed. In practice, various time, space, and cost constraints prevent us from dealing with actually random samples (samples that are representative of the population). Given this, let's analyze the most common sampling methods.

Haphazard Sampling: Many biological studies use this method to select specimens to be examined from a cage or a tank. This technique involves catching the animals by hand or by a net "at random" in that particular cage or tank. However, those animals that are caught in such a manner are usually the ones that are more friendly, weaker, or less agile. This problem can be solved with more effort and money. Therefore, when the results of a biological research study are presented, one should check the randomness of the sample and then reach his or her own conclusion, because if the results are the based on haphazard sampling they will be quite biased.

Judgment Sampling: In this method, "a couple of experts" determine the "typical units" that represent the population. This method is also extremely poor, because "experts" tend to disagree on which items are typical. Yates (1981) presents a good example. He had 12 experts collect a total of 1,200 stones, and then asked them to select three distinct samples of 20 stones as typical of the population according to their weight. Surprisingly, 30 out of 36 samples selected overestimated the true average weight.

Volunteer Sampling: If respondents are chosen from volunteers (generally human beings), then the results have considerable bias. This method is widespread in medical studies, because usually it is the only way to get relevant results. Since the medical profession's ethical code does not allow one to obtain random samples in medical experiments, various animals, such as guinea pigs, are used in labs. However, because drugs that are effective on animals are usually not all that effective on human beings, their relevance is not so clear. For example, we often hear that a particular "study shows that such and such an ingredient is harmful to your health, or cures such and such a disease." These are very poor statements, and hence not so reliable. However, they were the only results we could obtain. This bias is somewhat removed by continual progress in medicine.

Restrictive Sampling: This method is particularly important, because it yields very strange results when applied to social science research. Here, one takes a sample that is easy to obtain for a couple of different reasons. For instance, in archeology and history, the possessions of a king or an aristocrat are more likely to survive than those of a serf or a vassal, for the belongings of the rich and powerful are more durable and of a much higher quality and therefore survive for a longer time. In the United States, for example, a great deal of furniture and many houses of slave owners have survived; only very little of their slaves' possessions have survived. In Egypt, the artifacts discovered by archeologists belong mainly to the upper class (pharaohs, the noblility, etc). As a result, historians and archeologists produce very biased results, for their conclusions are based mainly on evidence belonging to members of the rich and/or aristocratic classes.

Is there a solution? Obviously, researchers only accept tangible items as evidence. Even though these artifacts are "hard" evidence, they cause the poor, and those who led modest and humble lives, to be under-represented in history. For example, many Prophets left virtually no personal items behind, except for ones like Muhammad, Jesus, and Moses (peace be upon them all), who were considerably recent.

Such an absence of personal items might be due to the fact that they led modest lives and shunned luxury. Hence, historians and archeologists should reconsider their position on this issue. In order to correct the bias engendered by restricting their conclusions only to tangible evidence, they should add the Holy Books and written religious texts to their category of "acceptable" evidence.

Another example is the media, whose usage of restrictive sampling produces very bigoted and biased results. The image of Islam presented in the Western media is a good example of this, for its depiction of Islam contains many misconceptions. First, Islam is presented as an exclusively Arab religion, despite the fact that Arabs account for only 15 to 20 percent of all Muslims. So either obtaining a representative sample of the world's 1.2 billion Muslims seems very hard to journalists, or else they deliberately restrict their samples to Arabs.

The most serious and severe misconception, however, is the media's equating Islam and terrorism, although the word "Islam" literally means "peace." On the other hand, the Western media somehow manages to mention Muslims and terrorism together many times. This is also quite odd, for only 1 percent or less of the world's 1.2 billion Muslims favor so-called "militant" groups. When news about such people is broadcast, this little percentage is strangely magnified, and the stereotype is generalized to include all Muslims.

It is also incorrect to label these militant groups as "Islamic." This is definitely a sampling problem, which seems to be done either on purpose or in a cursory way. Are there no good Muslims among 1.2 billion Muslims of the world? Rationally, if we suppose the impossible (that all Muslims really are terrorists), they would have destroyed the entire world already. Similarly, but to a lesser extent, there are stereotypes for Jews and Christians (mostly Catholics).

There is an urgent need for collaboration and cooperation between different religious groups to get rid off such damaging misconceptions and stereotypes. Muslims, having the worst stereotype, should be in the forefront of this undertaking. According to my experience and knowledge, many religious beliefs and teachings, and their followers, bear no resemblance to these misconceptions.

Such restricted sampling results in many seriously flawed conclusions in social science disciplines. Consider the case of Sigmund Freud, who still has many advocates and fans, as well as opponents, of psychology. From a sampling perspective, his analysis of the ego (psychoanalysis) cannot be considered reliable, for he based his hypothesis on just two persons: himself (at 1897) and an 18-year old female hysteric (Dora). Even from the statistical point of view, two people (the first one probably obsessed with sexuality, and the second an obviously abnormal person) can in no way be considered representative of the 2 billion people living at that particular point in time. Here, of course, I consider the severe time and cost restraints, but still, the existence of such difficulties does not justify the validity of Freud's conclusions.

CONCLUSION

In this article, we approached different issues in the social and physical sciences from a sampling (statistical) point of view. Surprisingly, this approach gave very interesting results in various sciences. The validity of any survey or research does not depend on its publicity (whether it is published, broadcast, or widely accepted); what is essential is that the whole picture be covered. Newly emerging interdisciplinary areas can help keep track of the whole picture. Also, the validity and reliability of results obtained through research, surveys, hypotheses, and theories depend upon the researcher's morals and honesty and ability to see "the big picture." Scientists and journalists should be very careful and responsible in their research, for most people are inclined to accept, without further exploration, whatever they hear or read.