9 Ocak 2023 Pazartesi

SpringBoot spring.jpa Hibernate'e Özel Ayarlar - Batch Ayarları

Giriş
Bulk ve Batch farklı şeyler. Açıklaması şöyle
BULK INSERTS: Is the process of inserting a huge number of rows in a database table at once (one or many transactions).
Batching: allows us to send a group of SQL statements to the database in a single transaction; it aims to optimize network and memory usage; so instead of sending each statement by itself we send a group of statements.
1. Kısaca
Şunlar kullanılır
1. hibernate.generate_statistics
2. hibernate.order_inserts : Hibernate SQL cümlelerinden değişiklikler yapar 
3. hibernate.order_updates : Hibernate SQL cümlelerinden değişiklikler yapar 
4. hibernate.flushMode

5. jdbc.batch_size :bir transaction içinde belirtilen sayıdaki statement'ı gönderebilir.
6. jdbc.fetch_size
7. JDBC URL ile de ?rewriteBatchedStatements=true

Örnek
Şöyle yaparız
spring:
  jpa:
    properties:
      hibernate:
        order_inserts: true
        order_updates: true
        jdbc:
          batch_size: 100
          batch_versioned_data: true
Örnek
Şöyle yaparız
spring.datasource.hikari.data-source-properties.rewriteBatchedStatements=true
veya şöyle yaparız
spring.datasource.url=jdbc:mysql://localhost:32803/db?rewriteBatchedStatements=true

2. Detaylı Açıklamalar

spring.jpa.properties.hibernate.generate_statistics Alanı
Tüm ayar değişikliklerinden sonra ölçüm yapmak lazım.

Örnek
Şöyle yaparız. Burada 1 tane JDBC bağlantısı alınıyor ve 1233 tane statement çalıştırılıyor
StatisticalLoggingSessionEventListener : Session Metrics {
    78588493 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    208607581 nanoseconds spent preparing 1233 JDBC statements;
    6474843328 nanoseconds spent executing 1233 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    6966643471 nanoseconds spent executing 2 flushes (flushing a total of 2466 entities and 1000 collections);
    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}

Örnek
Ölçüm yapmak için şöyle yaparız
spring:
  jpa:
    properties:
      hibernate:
        generate_statistics: true
Önce çıktı şöyle
Session Metrics {
    1272500 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    92831400 nanoseconds spent preparing 49207 JDBC statements;
    18557329900 nanoseconds spent executing 49207 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    21229826900 nanoseconds spent executing 1 flushes (flushing a total of 49043 entities and 223 collections);
    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}
Sonra çıktı şöyle. Burada artık JDBC batch kullanıldığı görülüyor
Session Metrics {
    872300 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    6031200 nanoseconds spent preparing 168 JDBC statements;
    103321900 nanoseconds spent executing 165 JDBC statements;
    734107200 nanoseconds spent executing 14 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    1737581300 nanoseconds spent executing 1 flushes (flushing a total of 49043 entities and 223 collections);
    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}
hibernate.flushMode
Açıklaması şöyle. Tam ne işe yarıyor bilmiyorum
... flushing is the synchronization of the state of your database with state of your session
... the flushing time is the time spent synchronizing the state of entities in memory with the state of this entities in the database;
Örnek
Şöyle yaparız
spring:
  jpa:
    properties:
      org:
        hibernate:
          flushMode: COMMIT
JDBC Bağlantı Parametreleri

reWriteBatchedInserts Parametresi
Genel açıklama şöyle. Yani ayrı ayrı olan bir çok INSERT cümlesini tek bir INSERT + çoklu VALUES haline getirir
... there are bulk (multi-row) insert query options available in most of the mainstream database solutions (Postgres, MySQL, Oracle). With syntax like:
insert into myschema.my_table (col1, col2, col3) 
values
(val11, val12, val13),
(val21, val22, val23),
....
(valn1, valn2, valn3);
While, Postgres and MySQL do support this features with the help of JDBC flag: reWriteBatchedInserts=true
But unfortunately, according to this resource, ms-sql JDBC driver does not support the multi-row rewrite of the queries.
Açıklaması şöyle
Asking PostgreSQL to Rewrite batched inserts
Hibernate will send a multiple insert statements to RDBMS at once, in order to insert data, and this will be done in the same transaction which is great; however if we are using PostgreSQL we could go a little further in our optimizations by asking him to rewrite those inserts to a single multi-value insert statement; this way if we have a 100 insert statements, it will rewrites them to a single multi-value statement.
Yani şöyle olur
// before Rewrite batched inserts
INSERT INTO container( ...) VALUES (...);
INSERT INTO container( ...) VALUES (...);
....
INSERT INTO container( ...) VALUES (...);
// After PostgreSQL to Rewrite batched inserts INSERT INTO container( ...) VALUES
(...),(...) ..., (...);
MySQL veya PostgreSQL kullanırken reWriteBatchedInserts=true yapmak lazım. Şöyle yaparız
jdbc:postgresql://localhost:5432/mastership?reWriteBatchedInserts=true
MysqlDataSource Sınıfını Bean Olarak Kullanıyorsak
com.mysql.cj.jdbc.MysqlDataSource yazına bakabilirsiniz

MySQL JDBC URL Kullanıyorsak
JDBC MySQL Connection String yazına bakabilirsiniz

hibernate.jdbc.batch_size Alanı
Açıklaması şöyle. Bir transaction içinde belirtilen sayıdaki statement'ı gönderebilir.
Hibernate uses a batch size where it stores statements, before running them in transactions, by increasing it, we will allow hibernate to increase the number of statements that it will send to the database in a single transaction.

Running more statements in a single transaction will result in less transactions and less time, it will also optimize the usage of the network.
- 30 yapılabilir. 4096 yapılabilir. Hangisi iyi denemek lazım
- Bulk Insert iyileştirmeleri yazısı burada
Böylece repository.saveAll() çağrısı bulk şeklinde çalışır. 
Örnek
Şöyle yaparız
spring:
  jpa:
    properties:
      hibernate:
        jdbc:
          batch_size: 4096
hibernate.jdbc.fetch_size Alanı
jdbc.batch_size gibidir ancak SELECT cümleleri için kullanılır. Bir transaction içinde belirtilen sayıdaki statement'ı gönderebilir.
Örnek
Şöyle yaparız
spring:
  jpa:
    properties:
      hibernate:
        jdbc:
          fetch_size: 4096

jdbc.batch_size Alanı ve @GeneratedValue İlişkisi
1. @GeneratedValue(strategy = GenerationType.IDENTITY) kullanmamak lazım. Çünkü bunu kullanınca Hibernate bulk insert yapamıyor. Tercih edilen şey SEQUENCE.
2. Postgre açısından da Primary key için kullanılan alan da SERIAL veya BIGSERIAL olmamalı. Çünkü o zaman PostgreSQL kendisi sayı üretmek isteyecektir. Açıklaması şöyle
... hibernate cannot “batch” operations for neither entities having IDs generated by GenerationType.IDENTITY strategy, 
nor entities having SERIAL/BIGSERIAL IDs (in the case of PostgreSQL database).
Şöyle olabilir
public class Book {
  @Id
  @GeneratedValue(strategy = SEQUENCE, generator = "seqGen")
  @SequenceGenerator(name = "seqGen", sequenceName = "seq", initialValue = 1)
  private Long id;
  ...
}
Şöyle olabilir
@Entity
@Table(name = "container")
public class Container implements Serializable {
  @Id
  @GeneratedValue(
    strategy = GenerationType.SEQUENCE,
    generator = "container_sequence"
  )
  @SequenceGenerator(
    name = "container_sequence",
    sequenceName = "container_sequence",
    allocationSize = 300
  )
  @Column(name = "id")
  private Long id;
  ...
  @OneToMany(mappedBy = "container", fetch = FetchType.EAGER)
  @Cascade(CascadeType.ALL)
  private List<Pallet> pallets = new ArrayList<>();
  ...
}
Örnek
Şöyle yaparız.
hibernate.jdbc.batch_size=50 // commit every 50 lines
hibernate.order_inserts=true // order statements to regroup inserts
hibernate.order_updates=true // order statements to regroup updates
Örnek
Şöyle yaparız. Burada hibernate 10 tane insert işlemini tek bir SQL ile yapıyor.
spring.jpa.properties.hibernate.jdbc.batch_size=10
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.order_updates=true

//Java
for (long i = 1; i <= 10; i++) {
  entityManager.persist(
    new Post()
      .setId(i)
      .setTitle(String.format("Post no. %d", i))
  );
}

//Hibernate output
Type:Prepared, Batch:True, QuerySize:1, BatchSize:10,
Query:["
    insert into post (title, id) values (?, ?)
"],
Params:[
    (Post no. 1, 1), (Post no. 2, 2), (Post no. 3, 3),
    (Post no. 4, 4), (Post no. 5, 5), (Post no. 6, 6),
    (Post no. 7, 7), (Post no. 8, 8), (Post no. 9, 9),
    (Post no. 10, 10)
]
hibernate.order_inserts Alanı
Açıklaması şöyle. Yani Hibernate önce bir sınıfın içindeki üye alan olan nesneleri veri tabanına gönderir. Böylece bağımlı nesnenin id'sini daha önce alır
For hibernate to be more efficient in batching (do more batching) especially in a concurrent environment, we will enable order_updates and order_inserts

Ordering by entity will allow hibernate to save entities that are fields in other entities first, and ordering ids will allow it to use the right sequence values in the statements.
Örnek
Şöyle yaparız
spring:
  jpa:
    properties:
      hibernate:
        order_updates: true
        order_inserts: true
hibernate.order_updates Alanı
hibernate.order_inserts ile benzerdir. Hibernate önce bir sınıfın içindeki üye alan olan nesneleri veri tabanına gönderir. Böylece bağımlı nesnenin id'sini daha önce alır
Örnek
Şöyle yaparız
## Hibernate properties
spring.jpa.hibernate.ddl-auto=none
spring.jpa.show-sql=false
spring.jpa.open-in-view=false
spring.jpa.properties.hibernate.jdbc.time_zone=UTC
spring.jpa.properties.hibernate.jdbc.batch_size=15
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.order_updates=true
spring.jpa.properties.hibernate.connection.provider_disables_autocommit=true
spring.jpa.properties.hibernate.query.in_clause_parameter_padding=true
spring.jpa.properties.hibernate.query.fail_on_pagination_over_collection_fetch=true
spring.jpa.properties.hibernate.query.plan_cache_max_size=4096
 
logging.level.net.ttddyy.dsproxy.listener=debug
Açıklaması şöyle
The spring.jpa.hibernate.ddl-auto setting is set to none to disable the hbm2ddl schema generation tool since we are using Flyway to manage the database schema automatically.

The spring.jpa.show-sql is set to false to avoid Hibernate printing the SQL statements to the console. As I explained in this article, it’s better to use datasource-proxy for this task. And that’s why we set the logging.level.net.ttddyy.dsproxy.listener property to debug in development mode. Of course, in the production profile, this property is set to info.

The spring.jpa.open-in-view property is set because we want to disable the dreadful Open-Session in View (OSIV) that’s enabled by default in Spring Boot. The OSIV anti-pattern can cause serious performance and scaling issues, so it’s better to disable it right from the very beginning of your project development.

The spring.jpa.properties.hibernate.jdbc.time_zone property sets the default timezone to UTC to make it easier to handle timestamps across multiple timezones. For more details about handling timezones with Spring Boot, check out this article.

To enable automatic JDBC batching, we are setting the three properties:

The first property sets the default batch size to 15 so that up to 15 sets of bind parameter values could be grouped and sent in a single database roundtrip. The next two settings are meant to increase the likelihood of batching when using cascading. Check out this article for more details about this topic.

The spring.jpa.properties.hibernate.connection.provider_disables_autocommit property is the one that instructs Hibernate that the connection pool disables the auto-commit flag when opening database connections. Check out this article for more details about this performance tuning setting.

The spring.jpa.properties.hibernate.query.in_clause_parameter_padding setting increases the likelihood of statement caching for IN queries as it reduces the number of possible SQL statements that could get generated while varying the IN clause parameter list. Check out this article for more details about this optimization.

The spring.jpa.properties.hibernate.query.fail_on_pagination_over_collection_fetch property is set because we want Hibernate to throw an exception in case a pagination query uses a JOIN FETCH directive. Check out this article for more details about this safety option.

The spring.jpa.properties.hibernate.query.plan_cache_max_size property is set to increase the size of the Hibernate query plan cache. By using a larger cache size, we can reduce the number of JPQL and Criteria API query compilations, therefore increasing application performance. Check out this article for more details about this performance-tuning option.

Hiç yorum yok:

Yorum Gönder