ClusterJ, API for Java and Scala#
ClusterJ is the name of our direct access to the RonDB data nodes for Java programs. ClusterJ is implemented on top of the C++ NDB API. The first layer on top of the NDB API is a JNI (Java Native Interface) layer that maps the C++ NDB API onto a Java NDB API.
Thus it is possible to program directly against most of the NDB API even from Java if desirable (this layer is called ndbjtie internally). ClusterJ adds another layer that makes it considerably much easier to program against RonDB.
I am personally quite fond of the ClusterJ interface even though I rarely program in Java myself. Interestingly ClusterJ can also be used from the Scala programming language. All Java libraries can be called from Scala.
The ClusterJ layer provides an object-relational mapping that is similar to Hibernate.
To explain how to use ClusterJ we will start with the initialisation methods followed by the classes created to map tables or parts of tables into Java objects. The next step is to go through the use of the Session object that controls accesses to RonDB. We will go through some details on how to control searches in RonDB from the Session object. We will also provide a number of examples of how to use the ClusterJ API.
There are plenty examples provided in the RonDB source code. The test suite for ClusterJ contains a large set of examples of how to use the ClusterJ API.
A fairly sizable application that uses ClusterJ is HopsFS that we covered a bit earlier. This implements a Hadoop file system on top of RonDB. It replaces the name nodes of the HDFS (file system in Hadoop) with a set of name nodes that use ClusterJ and store its data in RonDB data nodes. This code can also be used as an example of how to write a ClusterJ application.
This code is available on github.com and can be downloaded from there using the following command:
Compiling ClusterJ applications#
When compiling ClusterJ applications we need to have the jar file clusterj-api.jar in the class path. We provide the class path using the option -classpath.
RonDB place the ClusterJ API jar in the share/java under the RonDB installation directory.
The ClusterJ API jar file is called clusterj-api-21.04.9.jar in RonDB 21.04.9.
The program to compile Java code is javac that requires installation of the proper Java JDK for this. ClusterJ requires use of Java version 8 but can also be used with newer Java versions.
Here is the command used to compile my small test program.
The extra . in the classpath ensures that the compiler looks for a Customer.java or Customer.class in the same directory as my main program.
Executing a ClusterJ application#
Executing a Java application using ClusterJ requires access to at least two things. One is the clusterj.jar that is located in the same directory as the clusterj-api.jar described in the previous section.
The second thing we need access to is the NDB API library. We provide the path to this library in java.library.path. This library is found in the lib directory in the RonDB installation directory.
The following command will run your new ClusterJ application. The extra . directory ensures that classes already compiled in your local directory will be included in the compile or execution of your ClusterJ application. The command to start your ClusterJ application becomes more complex for bigger projects. Java code is executed by the java binary.
java \
-classpath $CLUSTERJ_INSTALL_DIR/clusterj-$CLUSTER_VER.jar:. \
-Djava.library.path=$MYSQL_INSTALL_DIR/lib \
myClusterJApp
Properties file#
All the properties that can be changed in the Properties object are documented here.
We will go through the most interesting ones in this section. When creating a session factory (one such per cluster connection) we need a Properties object as input. The recommended manner to fill in this object is by using a properties file. Normally this file would be called clusterj.properties and be placed under the same directory where your application resides. But given that this is under control of the application developer it could be named differently and be placed elsewhere as well.
After reading the file and filling in the properties it is also possible to change those settings programmatically in the Properties object.
com.mysql.clusterj.connectstring#
The connect string to the RonDB management server(s) is a necessary setting that most installations is required to set. It can be used to set hostname(s) of management server(s) (defaults to localhost), port number (defaults to 1186) specified as hostname:port.
com.mysql.clusterj.database#
This parameter defines the default database to use when working against RonDB. Defaults to test. It is possible to use another database when creating a Session object.
com.mysql.clusterj.connection.pool.size#
If more than one cluster connection it can be set here. One cluster connection scales to some level, but if the program is required to use a full server one might need a few more cluster connections to scale. Defaults to 1.
com.mysql.clusterj.connection.pool.nodeids#
To improve management of the cluster it is a good idea to have stable node ids on NDB API programs that execute for long times. For short lived programs it can be ok to get any node id.
This parameter can list a number of node ids to use for the cluster connections. The number of node ids should be the same as the size of the pool of cluster connections. It is also possible to only set one node id, in this case the node ids used are starting at this node id, e.g. if set to 55 and the pool size is 4 we will use the node ids 55, 56, 57 and 58.
Is not set by default. This means that ClusterJ will by default allocate any node ids that are free to use.
com.mysql.clusterj.connect.autoincrement.increment#
When using tables with autoincrement columns this parameter specifies how much we will step the autoincrement between each insert. Defaults to 1.
com.mysql.clusterj.connect.autoincrement.offset#
This represents the starting value for the autoincrements. Defaults to 1. The intended use case for non-default settings of autoincrement increment offset is for Global Replication where both clusters can be used to insert new rows at the same time. By ensuring that the clusters use the same increment, but different start values (= offsets) we can ensure that they won’t generate the same autoincrement value.
com.mysql.clusterj.connect.autoincrement.batchsize#
An API node allocates a batch of autoincrement values at a time. This parameter specifies how many values we retrieve per batch. Set by default to 10.
com.mysql.clusterj.connect.retries#
Specifies the number of retries before we give up on connecting to a management server. A negative value means that we will never give up. We will continue until we succeed with the connection setup. Set by default to 4.
com.mysql.clusterj.connect.delay#
Specifies the delay after an unsuccessful attempt until we make another attempt to connect to the management server(s). Defaults to 5 seconds.
com.mysql.clusterj.connect.verbose#
This will write a bit more verbose printouts during connect process if set. Defaults to not set (= 0).
com.mysql.clusterj.connection.reconnect.timeout#
The NDB API is designed for automatic reconnect at failures. This parameter specifies how many seconds to wait until we reconnect after a network failure. The default value is 0, thus we won’t attempt to reconnect. It is worth considering changing this value. The timeout is used to wait for all Session objects to be closed before a reconnect is performed.
There is a method reconnect on the SessionFactory object that can be used to change this value programmatically.
com.mysql.clusterj.max.transactions#
The maximum number of transactions that can be handled by one Session object. This is an important parameter to increase if using the asynchronous NDB API. This API is not accessible from ClusterJ. Thus in this case the only manner to use many transaction objects is through multiple open scans. Thus by default we can have at most 3 open scans in parallel. If more are needed this parameter must be incremented. There is no reason to decrease it. It is set to 4 by default.
com.mysql.clusterj.connect.timeout.mgm#
The time we will block waiting for a connection from a data node. When we timeout here we will check the timeouts before and after if they have expired, if not we will retry the blocking connect attempt. Defaults to 30000 milliseconds, only multiples of 1000 are allowed.
com.mysql.clusterj.connect.timeout.before#
This represents the time to wait for the first data node to connect before giving up. Defaults to 30 seconds.
com.mysql.clusterj.connect.timeout.after#
After the first data node have connected, this is how long we will wait for all data nodes to connect. Defaults to 20 seconds.
com.mysql.clusterj.max.cached.instances#
We can save Dynamic Objects representing rows in RonDB in the Session objects. It is very costly to create Dynamic Objects using Jave Reflection APIs. Thus by caching the already created ones in the Session object one can increase throughput by at least 3x.
Normally these objects are released using a call to release(...), but this call could be replaced by releaseCache(...) that will store them in the cache if there is space for it.
The default size of the cache is 128 objects.
This property was added in RonDB 21.04.0.
com.mysql.clusterj.max.cached.sessions#
Session objects are also costly to create. Thus it is a very good idea to save those in a cache. This parameter specifies the maximum number of Session objects stored in the cache.
The default size of this cache is 0 objects. Thus no caching, thus requires actively setting this variable to get the advantage of using this feature.
This property was added in RonDB 21.04.0.
com.mysql.clusterj.warmup.cached.instances#
It is possible to warm up the Session cache as part of startup. This will create Session objects using the default database. The default number of warmed up session objects are 0.
This property was added in RonDB 21.04.0.
In RonDB 21.04.9 we added the option to set database on a Session object. When using this feature it makes not much sense to set this constant to anything else than 0.
com.mysql.clusterj.byte.buffer.pool.sizes#
By default memory blocks used to store BLOB parts are pooled within ClusterJ. To save memory at the expense of more processing one can set this value to 1. This means that buffers for BLOB objects will be using normal malloc and free calls during their usage.
The default value is "256 10240 102400 1048576", thus providing four lists of BLOB buffers of various sizes.
com.mysql.clusterj.connection.recv.thread.cpuids#
Each cluster connection (SessionFactory) have one receive thread. This receive thread executes all signals arriving from the RonDB data nodes. Thus the receive thread is a potential bottleneck for scalability of the NDB APIs. It is thus a bottleneck due to that only one CPU can be used to execute these signals. The way to scale this is to use multiple API nodes, this gives access to multiple receive threads.
Given that this receive thread executes so much it can be beneficial to execute it in the same CPU all the time. This parameter specifies a list of CPU ids, the number of CPUs listed here must equal the pool size.
The default setting is that no CPUs are provided, thus no CPU locking is used.
It is possible to set/unset this programmatically on the SessionFactory object using the call setRecvThreadCPUids. The input to this call is an array of CPU ids of short type. The size of the array must be the same as the number of cluster connections used (pool size above). The CPU locking is unset by using the CPU id -1. It is set by setting a CPU id recognized by the OS in the array.
com.mysql.clusterj.connection.recv.thread.activation.threshold#
This parameter specifies the threshold for how many Session objects (NDB objects) that are active in a cluster connection before the receive thread is activated to take over execution of signals. Before this threshold the user thread will take care of the receive thread work (still only one thread at a time can handle the receive thread work). By default this is set to 8. If we lock CPUs it might be useful to set this to 1 instead.
It is possible to set this programmatically on the SessionFactory object using the call setRecvThreadActivationThreshold.
com.mysql.clusterj.max.transactions#
Can be used to set the maximum number of transactions used in a Session object in parallel. Set by default to 4. Since ClusterJ doesn’t support asynchronous execution of transactions this should not be required to change.
ClusterJ annotations#
One of the first step in defining your ClusterJ application would be to map the table objects in your database towards a set of Java classes. These classes are mapped to the table definition using annotations. Normally such classes maps all columns of the table, but it is also possible to only map a subset of the columns and indexes to the class.
It is possible to define several classes that maps to the same table. One can see these classes as different views on the tables.
Each column of the table has a get-method that returns the value of the field of the Java type that maps to the SQL type stored in RonDB. Similarly there is a set-method for setting the value of this column.
The following attributes are always preceded by an at-sign as shown in the below examples.
PersistenceCapable#
Before the class is defined we define that the class is a persistent class. The only attribute of interest to this annotation is the name of the table in RonDB.
PrimaryKey#
One can either define the entire primary key with all its columns before the class, but it is easier to simply add a PrimaryKey annotation before defining the get and set method of the column. No attributes are needed on the annotation if provided inside the class.
Column#
In many cases it is not necessary to provide any Column annotation. The default name of the column is the name found in the get and set call.
Now we have enough to give a very simple example. As shown in this example it is necessary to import all annotation types used in the java file.
The example table can be created using the following MySQL command:
CREATE TABLE customer (
customer_id INT PRIMARY KEY NOT NULL,
name varchar(255) UTF8,
KEY name_index(name)
ENGINE=ndb
);
In the example below we have a table customer with two columns. The first one customer_id is an integer that is also the primary key. The second field name contains the name of the customer. There is an index on this column.
Since we want to use the method names getId and setId for getting/setting the customer id, we describe the column using the Column annotation. This is required when the name of the column in NDB is different from the names of the methods.
import com.mysql.clusterj.annotation.PersistenceCapable;
import com.mysql.clusterj.annotation.PrimaryKey;
import com.mysql.clusterj.annotation.Column;
@PersistenceCapable(table="customer")
public interface Customer {
@PrimaryKey
@Column(name="customer_id")
int getId();
void setId(int id);
String getName();
void setName(String name);
}
We add the PrimaryKey annotation before the customer_id column to indicate that this is the primary key of the table.
The column name requires no mapping since it uses the same name on the methods as the name of the column in the database. There is an index on the column, this is discovered automatically by ClusterJ.
ClusterJ will automatically discover the partition key of a table to ensure that scans can avoid scanning all partitions when the partition key is provided.
Lob#
BLOB columns is mapped using the Lob annotation type. This is used both for text BLOB’s that can use the String type in Java and binary BLOB’s that can use the byte [] type. The Lob annotation is provided as part of class definition right before the BLOB column.
Projection#
One table can be mapped to more than one class. If a class will not contain all columns one should provide the Projection annotation before the persistence definition. At a minimum a class must contain the primary key or possibly a unique key. Without this information in the object it isn’t possible to persist the object in the database.
NotPersistent#
It is possible to add variables in the Java class that is not persistent and thus not mapped to any database column. In this case one adds the annotation NotPersistent before the attribute is declared with its access methods.
Null values#
Columns that can contain NULL values can handle NULL values differently. One method is to signal with an exception if the column isn’t set before the object is flushed to the database. Another method is to set the value to the default value if no value have been provided. Here are two examples of this.
@Persistent(nullValue=NullValue.DEFAULT)
public Integer getX();
public void setX(Integer int_val);
@Persistent(nullValue=NullValue.EXCEPTION)
public Integer getY();
public void setY(Integer int_val);
Startup example code#
import com.mysql.clusterj.ClusterJHelper;
import com.mysql.clusterj.SessionFactory;
import com.mysql.clusterj.Session;
import com.mysql.clusterj.Query;
import com.mysql.clusterj.LockMode;
import com.mysql.clusterj.query.QueryBuilder;
import com.mysql.clusterj.query.QueryDomainType;
import com.mysql.clusterj.query.Predicate;
import com.mysql.clusterj.query.PredicateOperand;
import java.io.File;
import java.io.InputStream;
import java.io.FileInputStream;
import java.util.List;
import java.util.ArrayList;
public class TestClusterJ {
... Implementation of test methods
public static void main(String[] args) throws Exception {
File propertiesFile new File("clusterj.properties");
InputStream inputStream = new FileInputStream(propertiesFile);
Properties properties = new Properties();
properties.load(inputStream);
SessionFactory sessionFactory =
ClusterJHelper.getSessionFactory(properties);
Session session = sessionFactory.getSession();
.... Call test methods
}
}
To start up a ClusterJ application it is necessary create one SessionFactory object. From this object we create Session objects that are used to interact with RonDB. The Session object is mapped directly to the Ndb object in the C++ NDB API as explained in the previous chapter. Ndb objects are mapped to cluster connections in a round robin fashion by the SessionFactory object.
To create a SessionFactory requires that we prepare a Properties object. We assume this object comes from reading the file clusterj.properties.
Equipped with a Session object we are ready to work with data in RonDB.
Session Object#
We get a session object from the session factory object. The session object is mapped to the Ndb object in the NDB API. It can handle one transaction at a time and multiple rows can be fetched and updated in parallel when interacting with RonDB data nodes. The call below retrieves a Session object that is connected to the default database defined in the Properties object.
The below call shows how to retrieve a Session object connected to another database. When the Session object have been connected to a database, it will stay connected to the Session object. If one need to connect to a new database, a new Session object is required.
This new method was added in RonDB 21.04.9.
Inserting a row#
We use a simple example based on the Customer class that we defined in the previous section. To insert a new object we create a new object of type Customer, next we assign values to all columns. Finally we call the persist function, the function makePersistent is equivalent to the persist function.
Customer newCust = session.newInstance(Customer.class);
newCust.setId(100);
newCust.setName("Mikael");
session.persist(newCust);
If the session has no transaction started yet, the persist call will perform the actual transaction in the RonDB data nodes that inserts the new customer row into the database.
Thus if no transaction is started in the Session object we execute in autocommit mode where each operation is executed as a separate transaction.
It is also possible to insert multiple rows in a batch operation using a single call to makePersistentAll.
List<Customer> insertInstances = new ArrayList<Customer>();
Customer newCust1 = session.newInstance(Customer.class);
Customer newCust2 = session.newInstance(Customer.class);
newCust1.setId(100);
newCust1.setName("Mikael");
newCust2.setId(101);
newCust2.setName("Michael");
insertInstances.add(newCust1);
insertInstances.add(newCust2);
session.makePersistentAll(updatedInstances);
The persist, makePersistent and makePersistentAll also works on tables without primary keys. These tables will generate a hidden primary key when an insert is performed on those tables.
Updating a row#
There are two variants of how to update a row. The first method is a so called blind update. This means that we don’t read the row before we update it. We simply overwrite all values in the updating class.
This example shows that the only difference to an insert is that we call updatePersistent instead of makePersistent.
Customer newCust = session.newInstance(Customer.class);
newCust.setId(100);
newCust.setName("Mikael");
session.updatePersistent(newCust);
Similar to makePersistentAll there is also a method updatePersistentAll that can be used without performing any previous read operations.
The other variant is a read before the update. In this case we perform the read by using the find call. In this case we only need to update the columns that will change. All the other ones will be persisted with the same column values as was read. Given that we are planning to update the row after reading it, we use exclusive lock mode already at the read to avoid deadlocks, in addition we have to explicitly use a transaction to ensure that the lock is held also after returning from find.
Default mode is autocommit, thus find will by default commit before returning (thus no locks are held when returning). Thus it is necessary to start a transaction and use EXCLUSIVE lock mode to perform a read followed by update in a transaction.
Note that calling setLockMode changes the lock mode of the session object and this persists as long as the session object exists or until a new call to setLockMode sets a new lock mode on the session object.
The lock modes that are available are
-
SHARED: shared lock on rows
-
EXCLUSIVE: exclusive lock on rows
-
READ_COMMITTED: No lock, read the latest committed read row
Here is an example of how the code to update a row using the find call and updatePersistent is made.
session.setLockMode(LockMode.EXCLUSIVE);
session.currentTransaction().begin();
Customer newCust = session.find(Customer.class, 100);
newCust.setName("Mikael");
session.updatePersistent(newCust);
session.currentTransaction().commit();
The find call will always go immediately to the RonDB data node and fetch the row. The second parameter to the find call is the primary key value, thus find always use a primary key read to retrieve the row.
One more variant is to issue multiple reads before we go towards the data node to fetch the rows. In the example below we use the load method (only prepares for a read) several times followed by a flush call that performs the prepared operation. The load method can only be used when an active transaction is ongoing.
Since we have a transaction started, the calls to updatePersistent will not take effect until we call commit, at commit both updates will be sent in one batch to the data nodes.
When a transaction is started only find, flush and commit will start communication with the data nodes.
session.setLockMode(LockMode.EXCLUSIVE);
session.currentTransaction().begin();
Customer newCust1 = session.newInstance(Customer.class);
Customer newCust2 = session.newInstance(Customer.class);
newCust1.setId(100);
newCust2.setId(101);
session.load(newCust1);
session.load(newCust2);
session.flush();
newCust1.setName("Mikael");
newCust2.setName("Mikael");
session.updatePersistent(newCust1);
session.updatePersistent(newCust2);
session.currentTransaction().commit();
As can be seen above we need to prepare the object used in the load call by setting the primary key value(s) before we call load.
We will show an alternative to using the transaction interface to batch updates. This uses the updatePersistentAll interface that takes a set of rows to persist through updating.
List<Customer> updatedInstances = new ArrayList<Customer>();
Customer newCust1 = session.newInstance(Customer.class);
Customer newCust2 = session.newInstance(Customer.class);
newCust1.setId(100);
newCust1.setName("Mikael");
newCust2.setId(101);
newCust2.setName("Michael");
updatedInstances.add(newCust1);
updatedInstances.add(newCust2);
session.updatePersistentAll(updatedInstances);
The updatePersistentAll takes a list of objects to update instead of a single object. In this case we don’t use explicit transactions, instead we use the autocommit mode. Thus the updatePersistentAll will perform the updates and commit in one interaction.
It is not possible to use updatePersistent and updatePersistentAll on tables without primary key since those calls require the primary key to know which row to update.
Writing a row#
savePersistent and savePersistentAll are calls that are very similar to makePersistent and makePersistentAll. They will perform an insert if the row doesn’t already exist. However they differ in that if the row does exist, it will instead perform an update of the row. Thus the row will exist in the table after performing a successful operation whether it existed or not.
These methods cannot be applied on tables without primary keys, those tables cannot know if the row previously existed or not since there is no primary key lookup possible in the table which is what savePersistent and savePersistentAll are using.
Deleting a row#
Deleting a single row using autocommit is very similar to an update except that it is only necessary to update the primary key of the object and use the call deletePersistent.
Customer newCust1 = session.newInstance(Customer.class);
newCust1.setId(100);
session.deletePersistent(newCust1);
We can use the read before delete in the same fashion as for updates and similarly we can use the batch interface to perform multiple deletes in a batched operation as shown in example below.
List<Customer> deletedInstances = new ArrayList<Customer>();
Customer newCust1 = session.newInstance(Customer.class);
Customer newCust2 = session.newInstance(Customer.class);
newCust1.setId(100);
newCust2.setId(101);
deletedInstances.add(newCust1);
deletedInstances.add(newCust2);
session.deletePersistentAll(deletedInstances);
Reading a row#
To read a row we can either use the find call or the load call. Both read an object based on the primary key and load it into a Java object.
find will execute immediately in the RonDB data node whereas load will execute asynchronously at the next interaction with an NDB data node (find, flush, query or commit calls). As mentioned above load can only be used when an active transaction is ongoing, it cannot be used in autocommit mode.
Thus the load call is easy to use for batching a set of reads whereas find might be a bit easier to program against given that the interaction is immediate.
load use an object where at least primary key have been filled in by a newInstance call or the object have been returned by previous find call or from a query.
Transactions#
Each Session object can have one current transaction at most. This transaction can be found by calling currentTransaction() on the Session object. This transaction object can be used to start a transaction using the begin() call, to commit a transaction using the commit() call, to abort a transaction using the rollback() call.
It is also possible to set the transaction into a mode where it can only be rolled back by calling the method setRollbackOnly() on the transaction object. The status of this flag can be checked with a call to getRollbackOnly(). If this rollback only flag is set when commit() is called, the transaction will be rolled back.
One can check if a transaction is currently active by calling isActive() on the transaction object.
Hints where to place the transaction#
Normally the hint to place the transaction coordinator is automatically derived from the first operation in the transaction. If for some reason we want something different we can set the hint through a call to the method setPartitionKey on the Session object.
This call needs two parameters, the first is a class that is a mapping towards an RonDB table. The second is an object that maps to the primary key of this table. In our examples the primary key is an integer, a number works fine as object.
For multi-column primary keys we need to create an Object[] type with one entry for each primary key column and the type for each column should correspond to the primary key column used. The order of the columns is the order defined when annotating the mapped object.
Creating a query#
Queries are scanning a table, either using an ordered index or through a full table scan. The interface in ClusterJ doesn’t explicitly specify using a table scan or an index scan. It will use an index scan if possible and otherwise it will use a table scan. ClusterJ doesn’t support queries against multiple tables in the same query.
ClusterJ contain a simple query optimiser that will decide the index to use through analysing the conditions.
If it is necessary to execute complex queries against RonDB from a Java application one should simply use SQL through the MySQL JDBC connector. It is perfectly possible to mix using ClusterJ and using JDBC. Some of the ClusterJ test programs does exactly that. Using the MySQL JDBC connector for RonDB tables works perfectly fine.
The execution of a query in ClusterJ goes through a number of steps.
-
Create a QueryBuilder object
-
Create a QueryDomainType object for the persistent class
-
Create parameters needed by query (optional)
-
Specify query condition (optional)
-
Create Query object
-
Set parameter values (optional)
-
Set skip and limit (optional)
-
Set scan ordering (optional)
-
Execute the query
-
Handle query result
We start with a simple example based on our usual Customer table.
// Create a Query Builder object
QueryBuilder qb = session.getQueryBuilder();
// Create a QueryDomainType object for the persistent class
QueryDomainType<Customer> qdc =
qb.createQueryDefinition(Customer.class);
// Create parameters needed by query (optional)
PredicateOperand id_low = qdc.param("id_low");
PredicateOperand id_high = qdc.param("id_high");
PredicateOperand searched_name = qdc.param("searched_name");
// Specify query condition (optional)
Predicate left = qdc.get("id").between(id_low, id_high);
Predicate right = qdc.get("name").equal(searched_name);
qdc.where(left.and(right));
// Create Query object
Query<Customer> qc = session.createQuery(qdc);
// Set parameter values (optional)
qc.setParameter("id_low", 100);
qc.setParameter("id_high", 102);
qc.setParameter("searched_name", "Mikael");
// Set skipped initial rows and limit on number of objects returned
qc.setLimits(1,2);
// Execute the query
List<Customer> results = qc.getResultList();
// Handle query result
In this example we created a QueryBuilder object and based on this we created a QueryDomainType for our Customer class that is a persistence class for the customer table.
We created one condition to return all Customers with id between 100 and 102. We use this as the left predicate. We add another predicate that we only want to return Customers with the name Mikael. We use AND between those predicates and create the query object.
100, 102 and Mikael was all parameters that could have been instantiated using input variables to a function, so this could be extended to a function very easily.
Next we set the result to skip the first row and at most return 2 rows.
After setting the parameters we are ready to execute the query. getResultList will execute the entire query at once. Thus when using this interface it is important to ensure that sufficient amount of memory is available to handle the full result of the query.
The name used for columns is the name of the attribute in the class, thus not necessarily the column name used in the table. Here we use id and not customer_id as the column name is in the table.
QueryDomainType class#
get#
get use the name of the column in the class to represent this in a query. The actual value is in the database, thus different for each row. It returns a PredicateOperand object.
param#
We can introduce parameters in our query. Before executing the query these parameters must get a value set through the Query object. It returns a PredicateOperand object.
where#
A Predicate is constituting the search condition. This predicate is boolean expression of predicates where each predicate represents a search condition. It returns the QueryDomainType object itself.
PredicateOperand class#
Many different search conditions can be applied here. Normally the PredicateOperand returned from the get call on the QueryDomainType object is used as base object and a number of methods are available to compare it with one or more other PredicateOperands. Mostly these will be PredicateOperands returned from the param call, but could also be another PredicateOperand returned from the get call.
The conditions that can be applied are equal, greaterThan, greaterEqual, lessThan, lessEqual, between (have low and high PredicateOperand), in and like.
in is used to represent a list of values, this is the same as a number of equality conditions that ORed together. In this context in is the same as calling equal, but calling in provides more context and makes it easier to choose the right execution plan for the query.
like has a parameter that is a string that can contain % and ? to form a LIKE expression.
We also have methods to check for NULL values, isNull and isNotNull.
Predicate class#
A predicate can be combined with other predicates to form a new predicate and also a predicate can be negated. We support and, or and not. Given these we can form any boolean expression since all other boolean expressions can be converted to a set of those.
Query class#
Before moving on to executing the query we need to set the parameter values defined in the query.
setParameter#
The parameter is provided as a string value and the value is of any type, but it must be conformant to something that can be compared to its other operands.
setLimits#
The setLimits call sets the skip parameter to specify how many rows to skip before considering the result rows and the limit parameter specifies the maximum number of rows that will be returned to the application.
Must be called before executing the query.
setOrdering#
This call corresponds to an ORDER BY call in SQL. The first parameter defines whether we should use ASCENDING order or DESCENDING order (Ordering.DESCENDING). ASCENDING is default.
After that the columns are listed in the order that we sort them on, we first sort on the first column and so forth.
The columns listed must be the columns in an ordered index. It is not necessary to specify all of the columns, but the order of the columns must be the same as the order specified in the index. Thus we don’t support general sorting of rows, but we support ordering of results from many index partitions that are scanned in parallel. In our example we could have used this method.
In our example above both the ordered index on the primary key and the index on the name column could have been used. With the below call we enforce that the index on the name column is used.
getResultList#
This call is a quick way of executing the entire query in one call without having to iterate the results. The returned result is a list of the Customer objects. This call will respect the setLimits call.
deletePersistentAll#
This is another method of executing the entire query in one call. In this case all the rows that we find in the query will be deleted. In this case we should ensure that we call setLockMode to set it to use the EXCLUSIVE mode before executing this query, otherwise deadlocks can easily occur.
Remember to switch back the lock mode after completing the transaction when switching to EXCLUSIVE lock mode.
This call can at the moment not be used with the setLimits call. This makes it very hard to use this call if there is a risk that the number of rows is unlimited. Most of the time it is better to use the method proposed below to delete and update rows.
explain#
Given that ClusterJ have a simple query optimiser it is also necessary to provide input to the user of how the query is going to be executed. Thus just before executing the query we can call explain to discover how this query is to be executed. The result is of the type Map\<String, Object>, it is possible to call toString on this object to get a textual representation of how the query is executed, which index is used, what type of index is used and what scan type is used.
Creating a delete query#
We will now use the above example to show how to perform a delete that uses a filter to delete a subset of the rows in a table. We will duplicate the previous example, but add transaction handling and the call to delete rows.
// Need to use EXCLUSIVE mode for DELETE transactions
session.setLockMode(LockMode.EXCLUSIVE);
// Start a new transaction
session.currentTransaction().begin();
// Create a Query Builder object
QueryBuilder qb = session.getQueryBuilder();
// Create a QueryDomainType object for the persistent class
QueryDomainType<Customer> qdc =
qb.createQueryDefinition(Customer.class);
// Create parameters needed by query (optional)
PredicateOperand id_low = qdc.param("id_low");
PredicateOperand id_high = qdc.param("id_high");
PredicateOperand searched_name = qdc.param("searched_name");
// Specify query condition (optional)
Predicate left = qdc.get("id").between(id_low, id_high);
Predicate right = qdc.get("name").equal(searched_name);
qdc.where(left.and(right));
// Create Query object
Query<Customer> qc = session.createQuery(qdc);
// Set parameter values (optional)
qc.setParameter("id_low", 1);
qc.setParameter("id_high", 1000);
qc.setParameter("searched_name", "Mikael");
// Set skipped initial rows and limit on number of objects returned
qc.setLimits(0,2);
// Execute the query
List<Customer> results = qc.getResultList();
// Handle query result
session.deletePersistentAll(results);
session.currentTransaction().commit();
// Need to set back mode to READ_COMMITTED which is default
session.setLockMode(LockMode.READ_COMMITTED);
Obviously real code requires a bit more error handling than the above code. In the above code the setLimits call ensures that we will only delete 2 rows at most in this transaction. The method used here means that we will retrieve the entire Customer row to delete the row. Actually it would be sufficient to use an object that defines the primary key of the table. This is important when deleting rows from tables with large rows.
For best performance one should not delete or update more than a few hundred rows per transaction. RonDB can handle transaction sizes of up to millions of rows, but best performance and shortest hold on row locks happens with at most hundreds of rows per transaction.
Creating an update query#
We can use exactly the same method as above for delete query. We replace the deletePersistentAll call with a call to updatePersistentAll. Before calling this method we should iterate over the Customer objects setting the new updated values in the Customer rows.
Column type mappings#
Mappings between MySQL data types and Java data types are described in the following web page. For the most data types there is a natural mapping to a Java data type. However in Java there are no unsigned data types. Thus mapping MySQL unsigned data types to Java isn’t straightforward. https://dev.mysql.com/doc/ndbapi/en/mccj-using-clusterj-mappings.html
Releasing Dynamic Objects#
The objects created by calls to newInstance and objects returned from find and query will be released through calls to release on the Session object. Below is an example releasing a customer object.
These Dynamic Objects are quite costly to create and release. Thus in RonDB 21.04.0 we added a new call releaseCache. This call will keep a cache of Dynamic Objects on the Session object. There is also a call to drop all objects from this cache on the Session object, the call is dropCacheInstance. Below is a examples of these calls. The releaseCache call requires knowing the class of the object to cache. When dropping the cache, one can either drop all objects of a certain class or drop all objects of all classes.
session.releaseCache(customer, Customer.class);
session.dropCacheInstance(Customer.class);
session.dropCacheInstance();
A new parameter on the Properties object was added for this purpose as well. The new parameter is com.mysql.clusterj.max.cached.instances.
Using the releaseCache instead of release on a simple benchmark where millions of key lookups were made through ClusterJ improved throughput in ClusterJ by 3x.
Release a Session Object#
Releasing a Session object is very easy, one simply calls close on the Session object. Below is an example.
Some design patterns create a new Session object for each interaction with RonDB. The creation and destruction of a Session object is again quite costly. To avoid this cost RonDB 21.04.0 added a new call to close a session, but to save it as a cached Session object that can be returned from the SessionFactory if required. Below is an example of this use case.
One can also clear the Session object cache with the below call.
There are new parameters added on the Properties object also for this property. com.mysql.clusterj.max.cached.sessions. This sets the maximum number of cached session objects and this defaults to 0, thus this property must be explicitly set for the caching of Session objects to be possible.
Another parameter is com.mysql.clusterj.warmup.cached.sessions. This parameter also defaults to 0, setting this means that a number of Session objects will be created as part of creating a SessionFactory object, these objects will be placed in the cache.
In RonDB 21.04.9 we added the capability to handle different databases in different Session objects. The cache of session objects can contain Session objects for many different databases. But the session object returned will always be a session object connected to the requested database. The cache maintains all session objects in an LRU (Least Recently Used) list to ensure that we keep the hottest session objects in the cache indepdendent of which database they are connected to.
ClusterJ reconnect#
When a cluster connection loses the connection to the cluster we have to close all open sessions before we can reconnect. The reason is that when we reconnect to the cluster, it could have restarted, thus there is no guarantee that we connect to a cluster in the same state as before. We discover this condition by calling getState() on the SessionFactory object. If the state is RECONNECTING or CLOSED we have lost connection to the cluster, if we discover this we should start closing Sessions.
Losing connection to the cluster means losing connection to all data nodes. As long as we are connected to at least one data node the cluster connection is still ok.
If we discover that we lost connection to the cluster we can call reconnect on the SessionFactory object. This initiates a reconnect independent of the settings used in the ClusterJ properties.
It is important to close Sessions before reconnecting since a cluster restart could potentially return the cluster with changes in meta data and data. Thus we need to start from a clean slate by closing all Session objects first. Otherwise we might use incorrect cached data in the Session object.
It is even possible that an initial cluster start was performed while we lost connection, no cached data in ClusterJ can be trusted if we lose all connections to the cluster.
The error code 4009 means that we lost cluster connection. 4010 indicates a node failure, thus most likely the cluster connection is still ok.
Dynamic mapping of rows#
The description of the ClusterJ so far has only described the case where the programmer knows the tables in the database. In this case one can use static mappings. It is also possible to use dynamic mappings in ClusterJ to access also tables that wasn’t known at the time when the program was developed.
Default values#
If a column has a default value this default value is stored in the RonDB data nodes. Thus these columns will get the default values if they are not set explicitly by the ClusterJ application.
CURRENT_TIMESTAMP#
CURRENT_TIMESTAMP as default value is a bit special. This is not a value that can be stored, thus these columns will get the value 0000-00-00 00:00:00 if not set from ClusterJ. The MySQL Server will set those by using metadata it has access to, ClusterJ will not perform this. In this particular case it is the responsibility of the ClusterJ application to set the default timestamp value like below on a column LastUpdated.
Foreign Keys#
Foreign keys are managed entirely by the RonDB data nodes. Thus all foreign key checks and actions will happen even if the table is managed from ClusterJ.
MySQL Triggers#
MySQL Triggers are only defined in the individual MySQL Server, thus if you have triggers defined on a table, it will not affect operation in other MySQL Servers, neither will it affect ClusterJ applications.
MySQL Replication can replicate MySQL Triggers to another MySQL Server.
MySQL Stored Procedures and Views#
Stored Procedures are very useful and there are very large NDB applications using thousands of stored procedures. However the stored procedure are only defined in the MySQL Server where they are created. Thus they have to be created in each MySQL Server where they are used. One can use MySQL Replication to replicate stored procedures to another MySQL Server.
One example of using stored procedures in RonDB is the DBT2 benchmark.
The same applies to MySQL Views that applies to MySQL Triggers and MySQL stored procedures.
Obviously MySQL Stored Procedures and MySQL Views and MySQL Triggers have no impact on ClusterJ applications.
RonDB extensions and fixes#
RonDB 21.04.0 added support for handling Date columns as primary keys in ClusterJ.
RonDB 21.04.0 added new extensions for caching Dynamic Objects and Session objects as described above.
RonDB 21.04.0 fixed a bug in padding CHAR strings that were larger than 255 bytes.
RonDB 21.04.0 made some changes to internal ClusterJ objects that made garbage collection of ClusterJ objects more effective.
In RonDB 21.04.1 support for primary keys using Longvarchar is added. This is required to be able to handle primary keys of VARCHAR type that are larger than 255 bytes.
RonDB 21.04.1 fixed a bug in the new object caches when unloading a schema.
RonDB 21.04.9 added support for getSession(String database) calls on the SessionFactory object to ensure that one cluster connection can operate on many databases concurrently.