TRPs, Surveys, And Polls: The Sample Size Enigma
When studying any survey, ask not what the sample size is but how stratified it is, writes Praveen Chakravarty.
The U.S. Presidential election has brought the issue of surveys and polling back to the fore. Polling is dead because they have been spectacularly wrong, seems to be the emerging consensus. In India too, the issue of measuring television ratings by BARC through sample surveys is at the centre of the TRP controversy. At the core of all these issues lies the technique of sampling. This column is not about the U.S. polls or the television ratings controversy but about the commonly-held misconceptions and myths about sampling.
Nearly every Indian household has a television set. They can choose to watch from a bouquet of 912 television channels. These channels are entirely dependent on advertising revenue which is determined by how many viewers watch each channel. It is impossible and undesirable to measure what everyone is watching every day. The only way it can be done is by selecting a sample from the 25 crore households to determine the channel preferences of all Indians.
BARC has a sample of 44,000 households with special devices installed in their television sets to measure which channels are being watched. Using this data, BARC issues television ratings for each channel as a metric for advertisers Rs 40,000 crore of television advertising is at stake in this whole process.
It is now alleged that these ratings were manipulated by a few television broadcasters, deemed a criminal offence. Experts and commentators have expressed their views on what is wrong with the current BARC methodology. Specifically, they have raised doubts over the sample size of 44,000 households and its ability to accurately reflect the viewership behaviour of 130 crore Indians.
The claim that 44,000 households are not a sufficient sample size is mathematically incorrect. Sample size is a commonly misunderstood idea. Yet, sampling is a very important technique that lays the foundation for some of the most important facets of our democracy from verifying the sanctity of elections to shaping the media landscape.
Consider how India’s democracy and economy are dependent on sampling methods.
- There were more than 10 lakh polling stations for the general election of 2019. A sample of 20,000 polling stations was selected out of the 10 lakh, to tally and verify the results of the electronic voting machine with the actual voter slips. The sanctity of the world’s largest democracy rests on the verification of a sample of 20,000 polling stations.
- One of the most important variables that drives India’s $3-trillion economy is the interest rate set by the Reserve Bank of India. One of the critical inputs for RBI’s committee is the inflation expectations of Indian households. Out of the 25 crore households in India, RBI selects a sample of 6,000 households to survey every two months. It uses these results as a vital input to make its decision on interest rates for the entire economy.
In this context, BARC’s sample of 44,000 that shapes India’s television advertising industry does not seem as small.
Representative Sample > Size
A very counter-intuitive but mathematical truism is that the size of the sample does not depend on how large the underlying population is. This is a very difficult concept for most people to come to terms with. How can a sample of just a few thousand out of many millions suffice to make observations of the overall population is the question that baffles most.
One intuitive illustration that can perhaps explain this idea is cooking. Let us suppose that you are cooking rice in an open pot for a family of five people. To determine if the rice is cooked, you randomly pick a few morsels of rice and check. If the morsels are cooked, you presume the entire pot of rice is cooked. If a restaurant was cooking rice for thousands of daily customers, the chef does exactly what the cook in the household does. He does not take a much larger portion of rice morsels to check. This is because the pot of rice is homogeneous and a few morsels behave almost exactly like the other millions of morsels in that pot. So, regardless of how much rice one is cooking, a small sample is good enough to check if the rice is cooked or not. Similarly, the size of the sample is not dependent on the size of the underlying population.
Human society is not as homogenous as a pot of rice. It is a melting pot, as the Americans call it. The real world is more like a pot of Kolkata mutton biryani that has meat, rice, potatoes, onions, cardamom, cloves, nutmeg, cinnamon, pepper, fennel seeds, ginger, cumin seeds, garlic, and yogurt. When Kolkata mutton biryani is cooked in an open pot, it is not enough to just take a few morsels of rice to check if the biryani is cooked. One needs to sample the rice, the vegetables, the spices, and the meat separately to check if its cooked. One cannot check if the biryani is cooked by tasting large quantities of just rice or meat or spices.
The quantity of the sample does not matter but more importantly, the sample should represent all the ingredients in the pot.
In statistical terms, this is referred to as stratified sampling. It is far more important to ensure that the selected sample is well stratified. But this is more complex in the real world than a pot of mutton biryani. Differences in people can be along the lines of age, gender, religion, caste, income, etc. Conservatives may argue that humans can be categorised in aggregates based on their religion or libertarians may argue every individual is different. It is not possible to sample every individual. So, the next best alternative is to ensure the sample covers as many stratifications as possible.
Groupings By Location
Modern societies have evolved in a manner that similar humans tend to agglomerate together in one place. Differences in humans can be proxied through their geographical locations. People living in a certain pin code can be classified as people belonging to a certain income class or religion or caste or occupation. The American economist Raj Chetty has published some splendid research on the importance of location and calls it ‘zipcode destiny’ in the American context. In other words, sampling people across multiple different locations will likely ensure the sample is well represented in terms of class, caste, occupation etc.
India can be geographically broken down into 30 states an union territories or 543 parliamentary constituencies or 650 districts or 4,061 assembly constituencies or 19,000 pin codes or 6.5 lakh villages or 11 lakh polling stations. This can be construed as the ingredients of India’s biryani. Samples can then be designed for various states, or parliament constituencies, or districts, or pincodes, or even polling stations, depending on the nature of the survey conducted.
In the case of BARC, the 44,000 household sample may suffice if t represents every assembly constituency.
If the 44,000 households were chosen from nearly all pin codes then it is a well representative sample. But if they were all picked from a few hundred cities and towns, then it is not well represented regardless of how large is the sample size.
Sociological surveys that elicit feedback from humans should strive to build samples with as wide a stratification as possible, such as all districts or assembly constituencies. It is election season now and predictably, there are surveys galore. Remember, ask not what the sample size is but ask how stratified it is!
Praveen Chakravarty is a political economist, and Chairperson - Data Analytics in the Indian National Congress.
The views expressed here are those of the author and do not necessarily represent the views of BloombergQuint or its editorial team.