Algorithm for determining bots and offers on Vkontakte

Under the cat there will be no neural networks and complex formulas, only the signs that I used to catch bots on my knee, the comparison of the result of filtering by these signs with filtering with one advertised service and a link to the js page, which anyone can test on his own list or test the last to join their community.

Picture to attract attention:

Prehistory

Recently, I needed to collect statistics on the weekly number of new subscribers in communities dedicated to commercial wedding topics. Under this task, a script was written that gathered new participants in the right communities and gave out rainbow-like, at first glance, numbers. After all, they stopped seeing rainbows after checking random accounts from the compiled list: some were banned by the social network on the day of the gathering, some turned out to be outright bots or offers (I will call them both in the future).

To obtain real numbers, it was necessary to find out the approximate share of bots in the collected subscribers. I tried to clean the audience from bots by the target-hunter (the first service that caught my eye, allowing bots to be filtered out for free), but the quality of the cleaning turned out to be so-so (the remaining ones still had fake accounts, and quite a lot). Services like “pay, and then we will show you what we can” decided not to use - it’s a pity for money, and as a result, the same black box and a dubious result. I decided to explore the bots pages and write my own filter.

Who we are filtering

To begin with, I’ll clarify that my goal was to filter out the accounts that I saw as trash from the point of view of inviting them to commercial wedding communities. This definition includes both bots entering on the machine and offers that someone makes for 100,500 units and then sells them as supposedly "live subscribers." Obviously, offers that the schoolboy manually catches up will not buy anything exactly as well as bots that catch up with the script. What they can do well is squander ad statistics for 1000 impressions. It can also catch quite real people, but what is the use of them in the community if they don’t see its entries (as well as there’s no point in showing them community advertisements)?

How do we filter

The simplest idea seemed to me to evaluate each account on a scale from 0 to 100, according to which obvious bots score 100 points, while ordinary people stay in the region 0 (ideally. In practice, some real people can score 50 points). The technique is not perfect (like everything in the war of the shield against the sword), but experience has shown that the creators of bots do not really bother with creating their fakes (a perfect bot will cost more than a client attracted by advertising), so at the moment it works. To fill the scale, several signs were selected, each of which can add or decrease a certain number of points, and accounts that score a certain number of points (70-100 in my case) are considered substandard and filtered. I will not write how many points are assigned when a particular attribute is found, you can look at them in the example, which will be at the end of the article, there you can change them, as well as the threshold, above which the account is credited to the camp bots. And now let's go over the checked signs:

Account banned

The first thing I filter is users. I don’t know why the services leave such accounts (and the “tx” service I mentioned above left them). A live person using the social network will restore access. It is easier for a spammer or bot driver with a thousand accounts to start a new account after the ban. Yes, and twist advertising on banned live users is still impossible.

Link to page not changed

Vkontakte gives users a unique link to their page instead of the nameless id12345678. This is not a very significant sign, since not all living people change it, and hijacked contacts may have such a link, but nevertheless often this link remains unchanged for newly registered bots.

No avatar

In 2018, this is of little relevance for bots. Rather, the absence of an avatar is characteristic of fake very lazy people, but I consider such an audience not to be of very high quality. In any case, this is also not a very significant feature.

There are links to other social networks

This is a good sign of a living person. I have not found a way to establish a link to facebook / instagram via api. Maybe he was looking badly, but maybe he was not. But to put a link for the bot is more difficult: you must at least have this account on the social network and link it to the VKontakte interface. Therefore, the presence of such links in the profile resets a few points on the counter of the botodeterminer.

Did not go online for more than 1-3 months

In an age when everyone has a social network client installed in their phone, such low activity seems suspicious. Even if it is not forgotten by the host bot, it is much harder to work with such a person through advertising. When you need a hot audience, which will be too late to offer the service in a month (they will already find another supplier), but this person is not online and you cannot reach him. I will repeat what was said at the beginning of the article - I studied the audience on the wedding theme, for her hot contact is relevant. If you decide to promote an entertainment public or a store based on people's hobbies, this attribute may be of less importance to you.

Subscribed to 500-1500 and more communities

Magnificent and significant sign of garbage accounts. The main article of making money on bots is joining various groups (well, yes, more likes and repost). And it is unlikely that the botofarm owners will be able to hide it. For the same reason, by the way, you can try to filter those who hide their groups from prying eyes (paranoids will also be filtered out in this case, but there are very few among the VKontakte audience). Even if you filter out a living person by this feature, nothing terrible will happen, he is unlikely to see the news of your community in his feed, being subscribed to 1000 others.

Mutual Twist Community

These should be left only if your Central Asia - students with a lack of attention, lack of interests and a lot of free time. Personally, I think such an audience is not just junk, but clearly signals that they are not real.

Consists of many communities about different cities

Frankly, I have not found a single reason why an ordinary person might be interested at the same time to follow the news of the groups “repair washing machines Kazan”, “outdoor advertising Omsk”, “interior design kaluga” and a dozen other commercial communities in different cities. Especially considering the quality of content in 95% of such communities. But the bot, earning on joining the community is very profitable.

Member in a group without avatar

I do not consider this sign significant, but during eye testing I came across an article about identifying bots for this sign. In general, such communities can be used as a technical test site (by programmers, to access the community key), they can simply be very young. But when discussing this topic with their friends, they told me that they would not join such communities. In general, this feature remained for me the most ambiguous, full of secrets and mysteries (as well as the very existence of communities without avatars).

No one is watching user posts

This feature is much simpler. Usually, if a user has a bunch of friends, but he has almost no views of records on the wall, then his friends are an imitation. And what is imitation of friends for, if not to give reality to a fake account?

Marked in photos of other users

At the moment, bots do not have the habit of marking each other in the photo, but real people very much point out, especially since the social network offers to do it very intrusively (so much so that she offers me to mark myself on my avatar). The presence of such a mark usually indicates either a hijacked account or a live user.

Filter check

To check the effectiveness of the search bots for these parameters was written a small service that allows you to check the downloaded list of contacts. Also, in order for the study to have practical value for people, the service has been added the ability to check your community - if you moderate a community, you can automatically download the latest members and check them. This is useful if you hired a person to engage in advertising and he gives you statistics on the growth of subscribers, but at the same time you do not see a real increase in orders / comments / likes.

The algorithm uses the wall.get method to check entries from the wall; it has a limit of 1000 calls per day, so using this script it is impossible to check more than 1000 people. However, this is enough to assess the quality of the audience. In addition, the script allows you to set your own values for the weight of each attribute and the bot definition threshold, so if you do not agree that a particular parameter defines bots, you can equate it to 0, or vice versa, increase its value.

Testing and comparison of results

According to the test results from the test audience of 2,935 people, the target hunter filtered 877 bots. Filtration according to the described algorithm eliminated 1984 people. If you tweak the filter and identify only the most malicious bots (subscribed to 500-1000 communities, of which a significant part are communities about different cities, either banned or in groups of promotion), the number of detected will decrease to 1,215 people, which, however, also exceeds the result the above mentioned service. However, I looked at about two dozen pages of users whom the targetgunter considered normal users, and my algorithm bots and all these users seemed to me dubious, on the pages of many were reposts of doubtful services (casino, adult dating, participation in contests, sports forecasts), or small number of views of records. I also came across accounts similar to commercial ones that promoted some services, but personally I’m ready to neglect them, especially if you consider that they subscribe to dozens of others besides the communities I need in a short time and whether they are interested in the subject I’m interested in. Although a softer filter can leave such accounts. And of course, I understand that 20 pages is not enough to judge the quality of all 1984 accounts.

In any case, I received results that satisfied me, although if there was free time, it would be possible to significantly expand the signs to search for bots. But the ones described above are quite enough (for the moment) to get a quality result. And once again a link to the implementation of the algorithm , so as not to squander the article.

Source: https://habr.com/ru/post/413855/

All Articles