Apache Ignite 2.5 Release - Memory-Centric Distributed Database and Caching Platform

In May, a new version of Apache Ignite - 2.5 was released. Many changes have been made to it, a complete list of which can be found in the Release Notes. And in this article we will look at key innovations that are worth paying attention to.

Apache Ignite is a horizontally scalable platform for transactional data storage, as well as distributed computing on top of this data in near real-time mode.

Ignite is used in cases where horizontal scalability and very high processing speed are required. The latter is also achieved by optimizing the platform for storing data directly in RAM as the primary storage, and not as a cache (In-Memory Computing). Distinctive features of the product are a full-fledged query engine ANSI SQL 1999, disk storage, expanding RAM, a large number of built-in integration tools and Zero-ETL machine learning.

Among the companies that use Apache Ignite are firms such as Veon / Beeline , Sberbank, Huawei, Barclays, Citi, Microsoft and many others .

New topology option: star around ZooKeeper

One of the major changes in version 2.5 is the new topology version. Previously, Ignite had only a ring topology, which was used to exchange events within the cluster and ensure efficient and fast scalability, on a scale of up to 300 nodes.

The new topology is designed for installations of many hundreds and thousands of nodes.

Now you can choose: use the old topology, where you only need Ignite, or - for large clusters - add large Ignite with small ZooKeeper , through which the exchange of events will take place.

Why is this and how it helps?

The large “ring” has a flaw: with regard to network exchange and processing, the notification of all nodes about a new event can reach seconds. This in itself is bad, and if we consider that the probability of node failure due to temporary network failure, equipment or other problems increases with the cluster size, the topology reorganization becomes quite a common task, especially on clusters built on cheap (commodity) hardware. In the large “ring”, this introduces additional chaos, interrupts the processing of the event flow and forces it to start over again, in parallel passing information about the new topology.

The new “star”, where the ZooKeeper cluster is located in the center, allows not only to preserve scalability (ZooKeeper can also grow horizontally), but even significantly increase it - scalability - efficiency on large volumes. After all, by sharing responsibility for event handling and working with data, we can allocate a much more compact ZooKeeper cluster for event handling, thereby reducing the dependence of the passage of events on the cluster size.

This does not affect the data exchange between the Ignite nodes, for example, when rebalancing, as well as data storage or retrieval, because all these operations went and bypassed the event topology through direct connections between cluster nodes.

Also, adding a new topology does not affect the old one - it still remains recommended, since it is much easier to maintain and more lightweight, it does not require additional elements.

But if you have reached the limit of scaling of the “ring” of Ignite, you can not be engaged in micro-optimization and adjusting hacks, do not apply specific knowledge and “crutches”. In this case, you have an officially accessible and supported solution that will significantly facilitate the implementation and support of such a large cluster.

Detailed information on the new topology can be found in the documentation .

Thin clients: thin Java client, authentication in thin clients

Version 2.4 introduced support for thin clients outside of JDBC / ODBC, which are not full participants in the topology, are much less demanding of resources, but at the same time offer reduced functionality. The first thin client was a client for .NET.

Starting with version 2.5, a thin client for Java is also available.

This allows you to embed Ignite into resource-sensitive applications, such as applications on end devices, without unnecessary layers. Previously, such a task required an intermediate layer, which, for example, according to REST, accepted requests and then used an internal “fat” client to exchange data with the Ignite cluster.

A thin client can be used by connecting the standard “zero-dependency” JAR file to ignite-core, for example, from the maven repository.

Example of using thin client:

ClientConfiguration cfg = new ClientConfiguration().setAddresses("127.0.0.1:10800"); try (IgniteClient igniteClient = Ignition.startClient(cfg)) { System.out.println(">>>    ."); final String CACHE_NAME = "put-get-example"; ClientCache<Integer, Address> cache = igniteClient.getOrCreateCache(CACHE_NAME); System.out.format(">>>   [%s].\n", CACHE_NAME); Integer key = 1; Address val = new Address(", 20", 101000); cache.put(key, val); System.out.format(">>>  [%s]   .\n", val); Address cachedVal = cache.get(key); System.out.format(">>>  [%s]   .\n", cachedVal); } catch (...) { ... }

Also in version 2.4 there was no authentication for thin clients. Starting with version 2.5, it, along with support for encryption when accessing thin clients, will debut in Ignite.

 ClientConfiguration clientCfg = new ClientConfiguration() .setAddresses("127.0.0.1:10800"); //   clientCfg .setSslMode(SslMode.REQUIRED) .setSslClientCertificateKeyStorePath("client.jks") .setSslClientCertificateKeyStoreType("JKS") .setSslClientCertificateKeyStorePassword("123456") .setSslTrustCertificateKeyStorePath("trust.jks") .setSslTrustCertificateKeyStoreType("JKS") .setSslTrustCertificateKeyStorePassword("123456") .setSslKeyAlgorithm("SunX509") .setSslTrustAll(false) .setSslProtocol(SslProtocol.TLS); //   clientCfg .setUserName("joe") .setUserPassword("passw0rd!"); try (IgniteClient client = Ignition.startClient(clientCfg)) { ... } catch (ClientAuthenticationException e) { // Handle authentication failure }

Detailed information can be found in the documentation .

SQL: authentication and user management, fast bulk loading

The distributed ANSI99 SQL engine is historically one of the strengths and important distinguishing features of Apache Ignite. It allows a “single window”, through a Java / .NET / C ++ client or through a JDBC / ODBC driver, to execute SQL queries on top of the entire cluster, both data in RAM and Ignite Native Persistence.

In versions 2.0–2.4, the developers focused not only on improving performance (which is never enough), but also on accelerating the primary and bulk loading of data and more complete support for DDL, such operations as CREATE TABLE, ALTER TABLE, CREATE INDEX, etc. which also through the “single window” could be consistently executed on the cluster.

In version 2.5, a step towards the further development of DDL was made: authentication for SQL was added and the corresponding CREATE USER , ALTER USER , DROP USER commands were added.

If you previously intended to place Ignite SQL in a sterile DMZ with access control at the level of overlying services, then you can now increase security with authentication controlled via SQL.

From the point of view of the speed of loading large amounts of data in 2.5 there were 2 innovations.

The first is the new SQL command COPY in JDBC, which allows you to quickly, in bulk mode, transfer the contents of a CSV file to a table.

 COPY FROM "/path/to/local/file.csv" INTO city ( ID, Name, CountryCode, District, Population) FORMAT CSV

The second is streaming mode for JDBC, enabled via the new SET command . It allows, when loading data through standard INSERT operations, transparently for the user to group and optimize loading of new records into the Ignite cluster. Transfer of accumulated operations to the cluster occurs when the batch is filled or when the JDBC connection is closed.

 SET STREAMING ON

Machine learning: genetic algorithms, bind to partitions

Machine learning is one of the strategic directions for the development of Ignite. ML implements the Zero-ETL paradigm, which allows you to train directly on the data that lies in the transactional Ignite repository. Thereby, significant time and resource savings are achieved in data conversion and network transmission to the training system.

In version 2.5, genetic algorithms have been added to the set of available tools.

Also, to optimize the learning process and minimize network exchange, there appeared support for ML computations that have information about the partitions on which they are executed and which can bind additional information to these partitions. Considering that partitions are atomic in terms of distribution, and one partition cannot be cut between several nodes, this makes it possible to optimize the learning process, providing more guarantees for the algorithm.

Detailed information can be found in the documentation .

And much more

Also in version 2.5 implemented:

support for executing SQL queries through Spark DataFrame directly in Ignite without raising intermediate data in Spark;
UPDATE support via LINQ in .NET;
custom handling of critical errors (for example, automatic restart of a node when memory is full);
transaction labels and additional debugging tools for long transactions ;
JMX bins to monitor transactions ;
the ability to interrupt long transactions if necessary to restructure the topology of partitions (which is blocked by the presence of active transactions);
tools for offline validation of data and metadata consistency ;
additional validation of the results of SQL queries ;
ODBC failover ;
new metrics : the number of partitions on the node for the cache group, the amount of RAM for the data region on the node, the total WAL size on the node, low-level metrics for the pages of disk storage;
DEB-packages in addition to the previously appeared RPM.

Source: https://habr.com/ru/post/416281/

All Articles