How good is an open-source R ecosystem to solve business problems?

The reason for the publication was the Rstudio blog entry: “Shiny 1.1.0: Scaling Shiny with async” , which can very easily pass by, but which adds a very significant brick in the task of applying R to business problems. In fact, in the dev version, shiny asynchrony appeared about a year ago, but it was as if frivolous and “make-believe” - this is the same dev version. The transfer to the main branch and publication on CRAN is an important confirmation that many fundamental issues are thought out, solved and tested, and can be easily transferred to the production and use.

And what else is there in R, besides the “diamond”, which allows turning it into a universal analytical tool for practical tasks?

It is a continuation of previous publications .

Why shiny

If we talk about the practical application of R for various data processing in business processes of a real company, then the main users of analytical results will be managers at various levels. We leave the DS layer of analysts out of the brackets, they need a wide range of tools, including direct access to the database. They themselves can and can do anything. Graphic web-based workstation will be a convenient help, but not a key differentiator.

Unlike a DS specialist, an ordinary manager needs a convenient interface that will provide him with all the information (historical, analytical, forecast, etc.) necessary to make a decision or report to management. In fact, the user interface is the “alpha and omega” of any enterprise system. No one will ever look under the hood (well, except that only on the long and painful stages of RFI-RFP). No one will ever figure out to experiment beyond the limits of their user-story, specified in official duties. No one will ever reflect on protocols, algorithms, validation and accuracy.

With Shiny, you can draw a very ramified interface that will include text, graphics, tables, almost all structural html elements (bootstrap framework). JS allows you to add complex tuning of the web interface, CSS - to make an arbitrary style design. It is also quite simple to do a few important things on R that qualitatively change the work with the interface, namely, dynamic content generation. Here we are talking about:

tabular and graphical data, changeable by timer or user request and modified when displayed in acc. with dynamic constraints (for example, hiding asterisks of parts of pers. data);
the composition of the elements in the interface (depending on the logic of the business process, you can add \ remove buttons, bookmarks, etc. during the execution);
the content of these elements (for example, filling lists of available values based on loaded data);
intelligent control of the contents of control elements (for example, the choice of values from one list will determine the available content of other elements);
implementation of the role model at the data level (for example, depending on the role, only certain subsets of an element may be available)

No interface - no System. And at exactly this point it becomes almost obvious why R, and not python. Because R has Shiny (packages + runtime) with which you can directly do R on User Interfaces for data processing systems of almost any algorithmic degree of complexity, while python, alas, does not have this and is not announced in the near future.

Asynchrony shiny and why it is so important

By itself, a shiny application is executed sequentially, for each url link (shiny app) in the shiny server open-source, there is one backend R process that serves the calculations in accordance with user activity. Up until the last open-source release, the shiny version was completely synchronous. This meant that any lengthy computation within the framework of the code “frozen” the application response for all users who simultaneously used it. Naturally, in the enterprise version of Shiny Server Pro, the issue of managing user sessions was resolved. The Consumer had the opportunity to choose whether to receive in 5 seconds everything that they love in an enterprise application or supplement it himself.

In principle, this feature of shiny applications could be leveled by:

distributing applications for different users to different url, including, for example, the user name (one code, links are made on the shiny server)
performing complex calculations in advance, in a different background process
optimal symbiosis between data processing capabilities in the backend and postprocessing in R.

However, now everything has become more convenient. Asynchrony through promise (s) mechanisms allows a couple of lines to generate additional R threads in which resource-intensive calculations will be performed, without affecting the performance of the stream and the response time of the main shiny application. So, formally, the issue of parallel work of many users can be considered resolved in the open-source version too. Time to drink coffee and wait for the result is not about Shiny.

Typical stories of practical application of R

Talking about models and ML in the framework of enterprise use is loved much and often, but in reality, one can approach the solution of these problems only after digitizing the task and preparing the data. And all this can be done within R.
Naturally, R alone does not always cost, depending on the scale of the task and the amount of data, both an open-source olap backend and an open-source data acquisition subsystem may be required. But this does not change anything, since the user only works with the User Application (see above).
Many of the stories previously produced special products from "big vendors", introduced over the years for billions of dollars in budgets. But now everything is solved much easier and cheaper. Practice shows that 99% of business problems fall into one of the three cases described below.

Case №1. Operational analytics

A typical task that is to create an operational feedback loop. Main steps:

multiprotocol and multiformat data collection in a mode that is close to real (according to the specifics of business processes, the optimal delta amounts to a few tens of minutes) from various systems of different manufacturers and directories for various formats. For example, it may be data from pumping equipment, data from various scanners, system operation logs
cleaning, normalization and enrichment with data from other sources and reference books
analysis of the obtained time series. here both the calculation of forecasts and the analysis of deviations from the predicted values, and the analysis of anomalies, and various antifrauds and forecasting of possible problems (for example, the temperature in refrigerators began to rise slowly, while the indicators are in settings, but the trend is obvious - the product may soon deteriorate)
calculation of any instantaneous KPI values (within the limits of the fantasy of business analysts)
multi-channel feedback locking: reporting, updating dashboards, automatic reporting to external systems (eg, monitoring), automatic execution of commands in underlying systems.

Classic specimens:

control of various equipment
monitoring of long-term business processes
“Online” sales analysis,
call center analysis,
general analysis of access control systems (for example, was there a request to SAP for access of a certain employee at a certain time to a certain place, or what the access control system sees is an anomaly?).

There are a lot of such tasks and everything can be solved by means of the R ecosystem.

Case №2. Excel consolidation

Practice shows that Excel in the vast majority of companies is the main tool for business analysts. For simple tasks, this is still acceptable; for complex tasks with a large amount of data, this approach turns into a black hole that sucks in any amount of resources and yields nothing at the output.

Typical task:
WHILE (! Fired) DO {

collect dirty data from masses of various sources, mostly excel manual handling;
repeatedly validate this all (technical and logical validation of sources + logical cross-validation between sources);
make calculations, consolidations, distributions;
make a lot of different unloading for issuing out to other units;
deftly report on the work done.
}
And all this is performed in the mode of constant trawling and processing.

Classic specimens:

Analytics for integrated project management systems (CPMS), when one msproject will not get off. A lot of contractors report as they can, but it’s necessary to make a consolidated picture and manage risks.
Ordering and distribution systems (trade and logistics). What to take, how to distribute, how to collect orders, how to decompose them. And not bad and predict purchases.

Case number 3. Decision Support Systems

It is even simpler and closest to pure ML:

collect information from where possible (various odbc and not quite odbc compliant sources, xml \ json, txt \ csv \ log, xlsx, REST API);
correlate data from different sources with each other and lead to a form digestible for ML algorithms;
come up with mate. model of the described business entities, to count;
draw in all sorts of cuts and presentations, generate various reports in a meninger form (docx, xlsx, pptx, pdf) with a description of the current situation and recommendations.

The classification by case is not contrived, but turned out on the basis of real business needs (science and pure ML \ AI \ DL separately). Probably, in the near future it will be possible “to share in the screenshots” about the solution of 2-3 problems.

Practice shows that R + Shiny allows you to "click" such tasks very, very effectively. If there are tasks, it makes sense to look at these tools more closely.

Previous publication - The structural elements of a reliable enterprise R application .

Source: https://habr.com/ru/post/416019/

All Articles