How to grow a healthy product (Juno example)

Juno

Much has been said about the advantages of working at a grocery company, and it is difficult to be original here. But about how to maintain the "health" of the product and what you can do in a grocery company, apart from the development of functionality, not everyone knows. We will explain how we operate the product at Juno, and how the operations department and technical specialists are involved.

We do not declare that our path is the most correct. We constantly try, make mistakes and try to learn from our mistakes. We hope that our experience will be useful to you.

About us: Juno is a ride-over service in the US that is part of the Gett group of companies.

In Juno, they write code in Go, Swift, Kotlin, Python, React.js as part of mobile application teams, Backend, Frontend, Data Science, Technical Operation Support, creating a service that has become part of the daily life of tens of thousands of drivers and hundreds of thousands of New Yorkers York

What does product management consist of?


Let's understand the process of operating in Juno and try to decompose it into its component parts.
We have identified three key components:

  1. Operational office
  2. Metrics and monitoring
  3. Incident Investigation

The purpose of operating a product is to respond to problems and changes in a timely manner, regardless of their nature.
For this you need:


With this approach, business decisions are based on data. Our operating team operates in New York, as Juno service is currently available only to residents of this metropolis.
The team’s daily to-do list looks like this:


Metrics and monitoring


In Juno, all teams have metrics that we agreed to divide into:


Business metrics are a series of indicators that allow you to evaluate the “health” of a product. Conditionally divide them into two parts:


To create analytical reports based on collected metrics, use Tableau. We have a Business Intelligence (BI) team responsible for such reports. They work in the Tel Aviv office next to the grocery team. Both teams work closely with their colleagues in New York, which allows, based on BI analysts, to evaluate the success of the actions taken, formulate hypotheses for testing in the “fields” and correct the product development plan.

On the other hand, there are a number of technical metrics that in one way or another affect the system as a whole.

Technical metrics are a series of indicators indicating the error-free operation of individual components, on the basis of which a conclusion is drawn about the operation of the system as a whole. They show how much time the calls between services take, how much memory they consume and whether there are critical errors in the transfer of messages between them. There are a lot of such metrics in Juno. They are somewhat redundant, but in critical situations it helps to quickly find the cause of the problem. Tracking and using technical metrics help us:



To monitor performance, we use Grafana and Prometheus. When developing a new service or adding a new function, the developers add the necessary metrics to the service, and then each team sets up alerts for itself.

Thanks to the configured alerts, the technical support team makes a primary analysis and escalates the problem into development or into business teams for further solution.
If the problem is technical in nature and threatens the normal operation of the service, the technical support team creates a production issue. Thanks to the automated process, interested parties are immediately notified, including the customer support team (Customer service aka Helpdesk aka L1 support), which is prepared for a possible rush of calls.

Incident Investigation


Over time, we came to the conclusion that after each serious incident, a kind of “debriefing” takes place. We are making changes to processes that help us avoid or better cope with similar events in the future.

The elements mentioned above: metrics, dashboards, alert and logs help to understand what happened. The teams sit together, analyze changes in technical and business indicators, take into account mistakes and take lessons for themselves.

You have to deal with both production incidents and any other situation where it is impossible to quickly answer “what happened”. And here the tech support team (TechSupport aka L2 support) helps.

What issues are solved in technical support? It is believed that this is a boring job, as in the IT Crowd series, where three nerds in the basement just do what they say: "try to turn off and turn on the computer." In fact, questions arise complex and ambiguous.

The first customer service level is organized according to the “follow the sun” principle. With this approach, round-the-clock user support is possible without night shifts. In European time, there is an office in Tel Aviv, and during the American hours - in Portland. The task of this team is to listen and understand the "pain" of the driver or passenger, to calm, if possible to help. The guys who work there are responsible for questions regarding the work of the service. At the same time, the team is not “technical”, and as soon as a moment comes when it is necessary to dive deeper into technical nuances, the request is redirected to the technical support team. This team works in Minsk and is part of the development center. The guys solve only technical issues and do not communicate with drivers and passengers directly. The task of the team: incident investigation and process automation.

In the case of a production incident, the task for the technical support team looks like this: a bug was found or a failure occurred during deployment, we noticed a problem, fixed it, but we still need to figure out how this affected the system and what needs to be restored from the point of view of product management


The questions are simple, but to answer them you need to understand very well how the system works and how its behavior changed during the incident. When answering a question, it is worth considering the ongoing process of deployment, as the likelihood that something can change every minute.

As an example, when technical support assistance was required for the correct operation of the product, consider the case “I did not make the trip”. The driver took another passenger and made a trip for which our passenger does not want to pay. In this case, it is necessary to distinguish between legitimate request and attempted fraud, when the user tries not to pay for the services rendered.

If the request arrives more than once, it is automated by the technical support team and provided to the user support team in the form of a web application. This approach allows us to reduce the time for processing the user's request and not to “inflate” the technical support team. Nevertheless, the vacancy of a technical support engineer is constantly open, as the guys grow and move to other development teams.

All roads lead to Rome


A detailed description of the work of the technical support team within this article is not accidental. It so happened that it has become a place where information from all sources flows. A single point of contact reduces the number of interpreters, and therefore reduces the number of distortions.

This does not mean that the technical support team is the main link in the management of the operating product, because the grocery company is a living organism: all the organs are important and necessary. It is impossible to choose what is more important for a person — the brain or the heart, the lungs, or the circulatory system. Only the harmonious development and interaction of all organs guarantees the healthy functioning of the organism or IT company.

Health to you and your products!

Source: https://habr.com/ru/post/415905/


All Articles