Data testing: requirements and levels



My name is Alexey Chumagin, I am a tester at Provectus. In this article I will explain how data quality requirements are formed and what levels of data testing can be.


Upd:
The article deals with large (or not) data, on the basis of analysis and aggregation, of which different processes are built, patterns are derived for use in further analysis or for decision-making. The data can be collected for a specific project from scratch, or databases previously collected for other projects or for commercial purposes can be used. The sources of this data are diverse and include not only the input by operators, but also automated and / or automatic measurements stored in the database systemically or unsystematically (in a heap, “then we figure out what to do about it”).

end-of-upd.


Why data testing is important


Data plays an increasingly important role in decision making in everyday life and in business. Modern technologies and algorithms allow you to process and store huge amounts of data, transforming them into useful information.

What is this data? For example, the history of your browser, transactions on your card, the point of movement of a device. They are impersonal, but this data still belongs to a specific device. If you collect and process them, you can get quite interesting information about the owner of this device. For example, where he likes to go, what is his gender and age. So gradually, we “humanize” the device and give it some characteristics.

Then this information can be used for targeted advertising. If you are a woman, then with a high degree of probability we can say that you are not interested in advertising shavers for men. You need to show ads related to your interests. The quality of advertising targeting can be improved due to what is known about the devices on which it is shown. You are shown the ads you want to see. So you will click on it. People who show you this ad will receive money for it, and the ad customer will receive a profit from what you learn about his product.

All this is based on the data owned by different companies and people. To effectively use this data, it is necessary that they are reliable and we know that this account belongs to these transactions.

As data becomes very much, their storage demands considerable resources. Data cleansing is a separate task that needs to be addressed. We want to store only the data that we really need. And we don’t want to have in our database duplicates or records that do not meet our criteria. For example, entries with empty fields. Therefore, there are requirements for data quality and the question arises about their testing.

What is quality


I like this definition: product quality is a measure of user satisfaction. It is clear that everything depends on the context of use of the product. If you use any well-known product, for example, Facebook or Skype, then you have the same quality requirements. You will put up with some errors, but still continue to use this product. And if you are a customer of a program and paid money for it, the quality requirements will be higher. You will find fault, watch some trivia. Different people have different ideas about quality, and different programs also have their own quality requirements.

Therefore, before developing and testing, people usually determine what they will consider a quality product. All this can be described formally. For example, we will consider our product quality if it does not contain critical errors. Or if he works for two weeks without a glitch.

Determining these requirements is not an easy task. Typically, software requirements form the business, and if we ask the business what the data should be, we can get an answer that the data should be good and clean. The task of the tester is to find out or clarify what the data is and by what criteria we determine their quality and purity. These criteria need to be formalized and fixed, made measurable.

How data quality requirements are formed


The tester begins to find out what is incomprehensible to him and what he would like to know about the object of testing. The tester makes a list of questions and begins to take an "interview" with the customer. He, in theory, should know what the data should be. For example, I ask whether empty cells or duplicate rows are valid.

Example of requirements - if we have a list of people, then the first name, last name and middle name may be repeated. But the entire set of lines can not be repeated. Repetitions may be allowed for a single cell, but not for a whole row or for a collection of several cells. Full coincidence should not be.

Next we begin to ask about the format of the data in a particular cell. For example, there should be 12 digits in a telephone number, in a bank card number - 16. We may have a criterion that not every sequence of these signs is a bank card number. Or we understand that there can be only letters in a surname. We may have many questions about the data format. Thus, we find out everything that we need to know about the subject of testing.

What is quality data?


Qualitative data must have several characteristics.


Data test levels


We can group the data by the so-called layers - a good analogy with the pyramid of testing works here. This distribution by the number of tests at different levels of the application.


In conclusion, I will say that data testing is an area that provides many opportunities for creativity and development. There is no silver bullet here: different approaches can be used to test data. The truth, as always, is somewhere in the middle.

Source: https://habr.com/ru/post/416183/


All Articles