29 Eylül 2021 Çarşamba

SpringData JPA JpaRepository ile @Query ve Stream - Batch İşler İçin Uygundur

Giriş
Veri tabanından veri çekmek için genellikle şu 3 yoldan birisi kullanılıyor
1. Loading everything at once : Bellek yetmeyebilir
2. Using Paging/Slicing : Veriyi dolaşmak için kod yazmak gerekir
3. Using Streams : En kolay yöntem bu

Ne Zaman Stream Kullanılabilir
Açıklaması şöyle. Yani normalde Optional veya List döndürülür ancak Sonuç listesi çok büyükse Stream döndürülebilir
Spring Data JPA repositories in their default setup expects to return instances of either Optional or List, meaning that the result set will be lifted into memory and mapped to your model object. This works well for many use cases, but when the result set becomes very large (> 100,000 records) or the result set size isn’t known in advance, memory will become a problem. Enter Stream.

Kullanım
Stream yöntemin gerçekleşmesi için JPA 2.2'deki Query arayüzünün getResultStream() metodu kullanılıyor. Eğer sadece JPA kullanıyorsak şöyle yaparız
Stream<Author> authors = em.createQuery("SELECT a FROM Author a", Author.class)
  .getResultStream();
Spring ile şöyle yaparız
@Repository
public interface EmployeeRepository extends JpaRepository<Employee, Integer> {

  Stream<Employee> findByHireDateBetween(LocalDate from, LocalDate to);
}

@Autowired
private EmployeeRepository employeeRepo;

@Transactional(readOnly = true)
public void exportEmployees(LocalDate hireDateFrom, LocalDate hireDateTo) {
  try (Stream<Employee> employees = employeeRepo.findByHireDateBetween(hireDateFrom,
                                                                       hireDateTo)) {
    employees.forEach(this::mapAndWrite);
  }
}

private void mapAndWrite(Employee from) {
  ...
}
Eğer N +1 Select problemi varsa şöyle yaparız
@Entity
public class Employee {
  ...

  @OneToMany(mappedBy = "employeeId", cascade = CascadeType.ALL, orphanRemoval = true)
  private Set<Salary> salaries;

  @OneToMany(mappedBy = "employeeId", cascade = CascadeType.ALL, orphanRemoval = true)
  private Set<Title> titles;
}

@Repository
public interface EmployeeRepository extends JpaRepository<Employee, Integer> {

  @Query("SELECT DISTINCT e from Employee e "
         + "LEFT JOIN FETCH e.salaries "
         + "LEFT JOIN FETCH e.titles "
         + "WHERE e.hireDate BETWEEN ?1 AND ?2 "
         + "ORDER BY e.employeeId")
  @QueryHints(value = {
    @QueryHint(name = HINT_FETCH_SIZE, value = "" + Integer.MIN_VALUE),
    @QueryHint(name = HINT_CACHEABLE, value = "false"),
    @QueryHint(name = HINT_READONLY, value = "true"),
    @QueryHint(name = HINT_PASS_DISTINCT_THROUGH, value = "false")
  })    
  Stream<Employee> findByHireDateBetween(LocalDate from, LocalDate to);
}
- Burada join yapıldığı için Employee nesnesi veri tabanından birden fazla gelir. Bunu engellemek için DISTINCT kullanılır
- LEFT JOIN FETCH ile child nesneler eager yüklenir
- ORDER BY niçin lazım tam anlamadım

Genel kullanım için açıklama şöyle
- Any stream operation must be wrapped in a transaction. Spring will throw an exception otherwise
- A stream is created by Spring Data JPA but must be closed by you. A stream keeps a cursor to the result set open since it can’t know when the result set has been consumed. If not closed you will run out of cursors in the database. Use try-with-resources or call close on the stream.
- Forward operations only. A stream can only be consumed once
Yani 
1. Stream mutlaka kapatılmalı
2. Stream'i işleyen kod @Transactional olmalı
3. İşlenen nesne EntityManager'dan detach() çağrısı ile çıkarılmalı
4. N+1 Select problemine dikkat edilmeli


Diğer Parametreler
@QueryHint parametresi de önemli. Kullanılan bazı şeyler şöyle

- HINT_FETCH_SIZE ile bir seferde getirilecek kayıt sayısı belirtilir.

- HINT_PASS_DISTINCT_THROUGH
Açıklaması şöyle. DISTINCT kelimesi veri tabanına gönderilmez, ancak Hibernate bizim için distinct işlemini yapar.
This instruction informs Spring Data JPA/Hibernate not to pass a DISTINCT statement to the database via SQL. Instead it will interpret the DISTINCT statement in our JPQL as an instruction to Hibernate not to return the same entity one time for each row returned, i.e. it is used in conjunction with the instruction regarding DISTINCT explained above.

Örnek
Şöyle yaparız
import org.springframework.data.jpa.repository.Query;
import org.springframework.data.jpa.repository.QueryHints;
import org.springframework.data.repository.Repository;
import javax.persistence.QueryHint;
import java.util.stream.Stream;

public interface BookRepository extends Repository<Book, Long> {
    
  @QueryHints(value = {
    @QueryHint(name = HINT_FETCH_SIZE, value = "" + Integer.MIN_VALUE),
    @QueryHint(name = HINT_CACHEABLE, value = "false"),
    @QueryHint(name = READ_ONLY, value = "true")
  })
  @Query("select b from Book")
  Stream<Book> getAll();
}
Bu repository kullanılırken, EntityManager ara sıra boşaltılmalı. Şöyle yaparız
@Component
public class BookProcessor {
  
  private final BookRepository bookRepository;
  private final EntityManager entityManager;
  
  public BookProcessor(BookRepository bookRepository, EntityManager entityManager) {
    this.bookRepository = bookRepository;
    this.entityManager = entityManager;
  }
  
  @Transactional(readOnly = true)
  public void processBooks() {
    Stream<Book> bookStream = bookRepository.getAll();
  
    bookStream.forEach(book -> {
      // do some processing
      entityManager.detach(book);
    });
    stream.close();
  }
}
Eğer .detach() metodunu çağırmazsak bir yerden sonra OutOfMemoryError hatası alırız. Açıklaması şöyle
The reason for this is that even though we are streaming entities from the database and have marked the query and transaction as read only, Hibernate keeps track of all entities in it’s persistence context. After going through a few thousand records, the heap will be full of these entities.

To resolve this we have to handle each entity in the stream in a different manner and most importantly tell Hibernate that it shouldn’t keep track of the entity in the persistence context once we are done with it.
Örnek
Elimizde şöyle bir kod olsun.
@QueryHints(value = {
  @QueryHint(name = HINT_FETCH_SIZE, value = “600”),
  @QueryHint(name = HINT_CACHEABLE, value = “false”),
  @QueryHint(name = READ_ONLY, value = “true”)
})
@Query(value = “SELECT * FROM journal_entries where accounting_entity_id = :id”, 
       nativeQuery = true)
Stream<JournalEntry> getJournalEntriesForAccountingEntity(Integer id)
Kullanmak için şöyle yaparız.
@Transactional(readOnly = true)
public String generateReportFileUrl(Integer id) throws IOException {
  Stream<JournalEntry> stream = 
    journalEntryManager.getJournalEntryStreamForAccountingEntity(id);
  ...
  stream.close();
}




Hiç yorum yok:

Yorum Gönder