Optimizing Data Fetching: Understanding the N+1 Problem in JPA
Written on
Welcome to the second part of our series on Hibernate/JPA within Spring! If you haven’t already, I recommend reading the first article, which discusses various database query techniques in Spring Data JPA.
Background
What is Lazy Loading?
Before discussing the N+1 loading issue, it's essential to understand lazy loading. This JPA feature postpones the initialization of related objects until they are actually needed. For example, consider a Users table and an Articles table.
When retrieving a list of users, you might not require all the articles written by each user. Therefore, you can configure the relationship for User.articles as a lazy relationship, instructing JPA to avoid loading article data by default.
public class User {
@OneToMany(fetch = FetchType.LAZY)
List<Article> articles = new ArrayList<>();
...
}
When you fetch a list of users, such as those whose names start with 'C', JPA will only load the user data, without fetching any related article information. Articles for each user will be retrieved only when accessed, hence the term 'Lazy Loading.'
This approach enhances performance when only user data is required, although it may not be optimal in all scenarios.
What is N+1 Loading?
The N+1 loading issue arises when an application executes a single query to fetch initial data (e.g., a list of users) and then performs an additional query for each row to fetch related data (e.g., the user's articles). Consequently, if there are N users, the application executes 1 query to retrieve all users and then N more queries for the articles, leading to N+1 queries in total.
For instance, if you fetch 10 users, accessing their articles will trigger a new query for each user, resulting in a total of N+1 query executions.
List<User> allUsers = userRepository.getAll(); // Grabs 10 users in the system
for (User user : allUsers) {
List<Article> myArticles = user.getArticles(); // Triggers Lazy Loading
...
}
Due to the overhead from network communication, transactions, and serialization/deserialization, each query execution incurs significant costs, adversely affecting both your application and the database server.
Just Using a Single Query
You might think it’s straightforward to write a native SQL query that combines everything into a single request when you need article information.
SELECT u.name, u.email, a.title
FROM USERS u
JOIN ARTICLES a ON u.id = a.author_id
Alternatively, you can fetch user information alone with:
SELECT u.name, u.email
FROM USERS u
While this is a valid strategy, how can we instruct JPA to behave similarly? This can be achieved by adjusting the Fetch Strategy.
JPA Relationships and Fetch Strategies
JPA FetchType
For every relationship annotation, you can specify the FetchType. While we initially set the FetchType to LAZY, we can alter this by changing it to EAGER.
public class User {
@OneToMany(fetch = FetchType.EAGER)
List<Article> articles = new ArrayList<>();
...
}
In this case, whenever a user is fetched, their associated articles will also be retrieved using Join Fetching, similar to the SQL example that retrieves all data in a single query. However, this introduces a different drawback, as every query on the User object will now always return the related articles.
How can we instruct JPA to occasionally retrieve just the User information while other times also load the associated Article data in a single query? There are several methods to accomplish this, which we will examine one by one.
Default Fetch Strategy
Before delving into solutions, it's vital to understand the default fetch strategies for various relationship types in JPA. The default behavior in Spring JPA is to lazily load collections (to-many relationships) while eagerly loading singular references (to-one relationships). This can be suboptimal based on the application's access patterns, as demonstrated in our example. The User to Article relationship is OneToMany, meaning the default behavior starts as LAZY. Thus, we did not need to explicitly set it as LAZY with @OneToMany(fetch = FetchType.LAZY).
As you will discover, controlling JPA behavior revolves around overriding the fetch strategies.
Deep Dive Into N+1
Let’s revisit our Entity Relationship Diagram, now with an additional entity.
Assuming all relationships are using their default FetchType, the User to Articles relationship is OneToMany with Lazy, while the Article to Category relationship is ManyToOne with Eager.
You might assume that since the lazy relationship only exists between users and articles, the N+1 problem would only occur with them. However, it can also arise when dealing with the relationship between articles and categories.
Consider a query that retrieves all articles without any overrides (e.g., findAll() with Spring JPA Query Method). This could yield 5 articles associated with 3 different categories.
This results in N+1 loading, leading to a total of 4 queries: the initial query to load articles plus 3 more to load each category. Even if category information isn't needed, it will always load due to eager fetching.
The SQL queries would look like:
SELECT c.* FROM category c WHERE c.id = ?
The final result can be represented as follows:
Even if the OneToOne relationship is set to Lazy, accessing categories will trigger lazy initialization, resulting in N+1 behavior as well.
Side Note: Lazy optional `OneToOne` relationships were previously unsupported without additional enhancements, but they now appear to be supported by default in Spring Data JPA.
Now, consider querying for users. If you retrieve all users (Green) from the database (3 users) associated with 7 articles (RED) which are eagerly linked to 5 categories (Blue).
If you access the related articles for each user, triggering lazy initialization, how many queries will this cause? Would it be 7 + 1? 5 + 1? No, the answer remains 4. When Hibernate triggers lazy initialization for articles tied to each user, it also performs a Left Join Fetch for categories. Each lazy query executes a SQL statement resembling:
SELECT a.*, c.* FROM article a LEFT JOIN category c ON c.id = a.category_id WHERE a.author = ?
Here, the eager relationship from Article to Category extends the fetch boundary for each initialization.
In summary, Hibernate does not utilize EAGER fetch to enhance the fetch boundary for the initial query but does so for lazy queries.
In both examples, the goal is to fetch all necessary information in a single query without incurring additional queries, regardless of whether child relationships are defined as LAZY or EAGER.
With this foundation, we can discuss how to manually control the fetch boundary of the initial query. There are two primary strategies:
- Using JOIN FETCHING in the Query
- Defining ENTITY GRAPH
Extending Fetch Boundary with JPQL/HQL Fetch Joins
Note: I will use Member and User interchangeably in the following sections. I apologize for any confusion.
"Fetch Join" is a method to extend the "Fetch Boundary" when defining JPQL/HQL. According to Hibernate documentation, Fetch Join allows associations or collections of values to be initialized alongside their parent objects using a single select.
In simpler terms, Fetch Join enables developers to specify how associated entities should be loaded along with the main entity in the initial query.
For our examples, we could define each query as follows:
public interface ArticleRepository extends ListCrudRepository<Article, Long> {
@Query("SELECT a FROM Article a LEFT JOIN FETCH a.category")
List<Article> getAll();
}
Here, we perform a "Join Fetch" on the Category relationship from Article.
public interface MemberRepository extends ListCrudRepository<Member, Long> {
@Query("""
SELECT m FROM Member m
JOIN FETCH m.articles a
LEFT JOIN FETCH a.category
""")
List<Member> getAll();
}
You can also define nested associations to retrieve all Articles of each User along with Category for each Article.
It's important to note that since Article to Category is defined as "LEFT JOIN FETCH," this is necessary when the relationship may not exist, as required in our case since Article to Category has @OneToOne(optional = true) (optional = true is the default behavior). In the above JPQL, the User to Article relationship does not utilize "LEFT JOIN FETCH," which would omit any User without Articles. To avoid this, the query should be adjusted as follows:
-- Will also retrieve Members without any articles
SELECT m FROM Member m LEFT JOIN FETCH m.articles a LEFT JOIN FETCH a.category
Similarly, you can perform Fetching on Criteria Queries.
CriteriaBuilder builder = entityManager.getCriteriaBuilder();
CriteriaQuery<Article> criteria = builder.createQuery(Article.class);
Root<Article> root = criteria.from(Article.class);
// Left Join Fetching Category
Fetch<Article, Category> personFetch = root.fetch("category", JoinType.LEFT);
List<Article> articles = entityManager.createQuery(criteria).getResultList();
Utilizing Join Fetch is a straightforward approach to overriding the default "Fetch Boundary." However, the downside is that developers need to manually define these for each query. This is where Entity Graphs come into play.
Refer to more on JPQL/HQL and Criteria in Exploring Every Database Query Technique in Spring Data JPA.
Defining Entity Graph
EntityGraph is a feature introduced in JPA 2.1, serving as a template that outlines a subset of an entity’s attributes and relationships to be fetched from the database in a single query. It allows developers to specify which properties or associations should be eagerly loaded and which should remain lazy.
If you use Spring Data’s Query Methods or Specifications, you can manage the "Fetch Boundary" using EntityGraph. Typically, EntityGraph is defined within the entity class.
alt: Entity Graph example
For instance, in the case of User(Member) -> Article -> Category, if we want to fetch both member.articles and article.category, this can be done by constructing @NamedEntityGraph.
@Entity
@Data
@NamedEntityGraph(
name = "graph.Member.articles.category",
attributeNodes =
@NamedAttributeNode(value = "articles", subgraph = "article.category"),subgraphs = {
@NamedSubgraph(
name = "article.category",
attributeNodes = @NamedAttributeNode("category")
)
}
)
public class Member {
@Id @GeneratedValue
private Long id;
private String name;
private String email;
@OneToMany(mappedBy = "author", fetch = FetchType.EAGER)
private List<Article> articles;
}
Here, we define an EntityGraph called graph.Member.articles.category that will always fetch the associated articles and specifies a subgraph for the articles to fetch the category association.
This can be utilized in Spring Data Query Methods with the following annotation:
@EntityGraph("graph.Member.articles.category")
List<Member> getUsingEntityGraphByNameStartingWith(String name);
This results in SQL that employs "Left Joins" for each relationship.
If you prefer to define the Article -> Category fetch configuration within Article, you can do so and reuse it.
@Entity
@Data
@NamedEntityGraph(
name = "graph.Article.category",
attributeNodes = @NamedAttributeNode(value = "category")
)
public class Article {
@Id @GeneratedValue
private Long id;
}
Now, in the Member class, you won’t have to define the subgraph and can simply use the graph defined for the Article entity.
@Entity
@Data
@NamedEntityGraph(
name = "graph.Member.articles.category",
attributeNodes =
@NamedAttributeNode(value = "articles", subgraph = "graph.Article.category"))
public class Member {
@Id @GeneratedValue
private Long id;
}
With EntityGraphs, you can reuse existing definitions and only create new ones as needed.
If desired, you can also define a one-off EntityGraph directly within the query.
@EntityGraph(attributePaths = {"category"})
List<Article> findByTitleStartingWith(String title);
When using EntityGraph, be cautious of infinite loops. For example, if `Member` has `friends` relationships that also point to `Member`, the EntityGraph must be designed to prevent endless loops, such as only fetching top-level friends rather than friends of friends.
Using BatchSize to Enhance N+1 Performance
Another strategy, while not directly related to our discussion on "Fetch Boundary," is BatchSize.
BatchSize effectively groups your N in the N+1 problem, transforming SQL into an IN query to decrease the total number of queries.
SELECT * FROM category WHERE (?, ?, ? ....)
For our Article -> Category example, if you retrieve 100 articles, each with a unique category, this would typically generate 101 queries. However, by using BatchSize(size=20) defined on the Entity class, this can be reduced to 6 queries, calculated as (100/20) + 1.
@Entity
@Data
@BatchSize(size=20)
public class Category {
@Id @GeneratedValue
private Long id;
private String name;
}
Conclusion
Addressing the N+1 problem in Hibernate and JPA requires a strategic decision between lazy or eager loading, along with advanced features like Fetch Joins and Entity Graphs. Understanding these options will empower you to optimize data fetching in your applications effectively.
In our upcoming article, we will simplify these choices further by discussing Entity Design with Domain-Driven Development. Stay tuned!