In this article we continue our research on how to implement field level text search by creating a companion translations table for each class. We'll use GIN / GiST indexes which ware described in the opening article of this series and we'll try to implement the described concept using Spring Data JPA, Hibernate and Postgres.
Table of contents
This blog post is part of a series of articles about internationalizing data with support for text searches:
- Indexes available in Postgres and anti-patterns
- Performant internationalization in PostgreSQL + Hibernate / JPA:
- Companion translation table - your currently reading this
- HStore column with translations - article will be published soon
From the SQL perspective for each table that represents some entity in the system that has translatable fields we will create a companion table with translations. One row of the companion table will contain translations for all fields of a given row from the source table and a given language. Let's visualize this with an example of a registry of products sold in a shop.
A product in the registry, besides an identifier which is its primary key, will have fields:
- which do not require internationalization: Code, Price,
- which are translatable: Name, Description.
The companion table with translations for products fields will have a composite primary key of:
- a reference to the products identifier,
- the language code which indicates the language of the translated field values.
All other columns in it will hold translations for corresponding columns in the source table. We assume that in the Product table we are using the language that is native for the personnel that operates the system. In the ProductTranslation table we store translations for all other languages.
Object model with JPA
Let's now try to develop an object model in Java and JPA that should generate a relational model similar to the one we introduced in the previous chapter.
Spring service for text search
Now we will test our model in practice by implementing a Spring service with a REST API that will allow us to:
- add and retrieve translations of a given product;
- search for products by their name in a given language.
We will use Spring Data JPA to handle our SQL queries.
Indexes in Postgres and performence testing
Hibernate will generate our SQL tables based on our object model with JPA annotations however it will not provide any optimizations. Those GIN indexes, that we mentioned in the first article of this series, will have to be added to Postgres manually.
We'll write a Scala script for Gatling to populate our database using our REST API.
Let's make a request to our service to check how our search endpoint operates.
The results are correct however searching through 100'000 products took quite a lot of time: ~350 ms. Let's check if those indexes we added ware actually used by the database. Using Spring Boots property
spring.jpa.show-sql=true we can see what SQL queries did Hibernate send to Postgres and then pass them to the EXPLAIN statement.
As we can see both tables had their entire contents fully scanned (Seq Scan) which indicates that indexes ware not used. On StackOverflow and Postgres Mailing List we can read the Postgreses query optimizer cannot handle two conditions with an OR statement on two joined tables using GIN indexes. Using an union was suggested as separating those two conditions into two queries results in a query execution plan that uses our indexes.
From Prostgreses point of view doing an union of those two partial queries would solve the problem. However, doing that using JPA / Hibernate won't be easy as those technologies do not support unions in queries. Unfortunately, to finish our implementation we would have to:
- either build queries in pure SQL - a solution that will become unwieldy when more complex dynamically build queries will be needed in the project as we will not be able to use JPA Criteria API or Spring Data Specifications,
- or do one query for each criterium and combine the results in Java.
Both solutions are rather dissatisfying and therefore we will finish here our experiments with the companion translation table concept.
The companion translation table pattern in an efficient solution to the problem discussed in this series of articles however can be problematic to implement because of the limitations imposed by both our database engine and ORM library. As our experiment shows, deficiencies in Posrgreses query planner combined with missing features in Hibernate / JPA would force us to develop a suboptimal workaround. Because of this we should move on to our next concept which despite the usage of a Postgres specific data type can prove easier to integrate with
Hibernate / JPA.
Source code used in this article can be found in our bitbucket repository.
Next article in the series: HStore column with translations - article will be published soon
This article is a result of our cooperation with Nextbuy - a SaaS company which develops a procurement and online auction platform to connect buyers and suppliers. We provide various consulting and software development services to them and they have kindly allowed us to publish part of the resulting research / design documents. You can checkout their amazing platform at www.nextbuy24.com
Do you need help in your company with some topic we mentioned in our blog articles? If so, then please feel free to contact us. We can help you by providing consulting and audit services or by organizing training workshops for your employees. We can also aid you in software development by outsourcing our developers.