In this blog post I want to illustrate how we can eager load child and parent objects inside memory using NHibernate and how to avoid the nasty problem of creating Cartesian products. I will show you how this can be achieved using the three different type of Query pattern implemented inside NHibernate.
For this example I am using the version 3.3 of NHibernate against a SQLite database to have some quick “in memory” tests.
The Domain Model
My model is quite straighforward, is composed by a Person entity and two child collections, Address and Phone, like illustrated in the following picture:
For the Id I am using a System.Guid data type, for the collections I am using an IList<T> and the mapping is achieved using <Bag> with the inverse=”true” attribute. I don’t write the remaining mapping for simplicity.
<class name="Person" abstract="false" table="Person"> <id name="Id"> <generator class="guid.comb" /> </id> <property name="FirstName" /> <property name="LastName" /> <bag name="Addresses" inverse="true" table="Address" cascade="all"> <key column="PersonId" /> <one-to-many class="Address"/> </bag> <bag name="Phones" inverse="true" table="Phone" cascade="all"> <key column="PersonId" /> <one-to-many class="Phone"/> </bag> </class>
NHibernate Linq
With the Linq extension for NHibernate, I can easily eager load the two child collections using the following syntax:
Person expectedPerson = session.Query<Person>() .FetchMany(p => p.Phones) .ThenFetch(p => p.PhoneType) .FetchMany(p => p.Addresses) .Where(x => x.Id == person.Id) .ToList().First();
The problem of this query is that I will receive a nasty Cartesian product. Why? Well let’s have a look at the SQL generated by this Linq using NHibernate profiler:
In my case I have 2 Phone records and 1 Address record that belong to the parent Person. If I have a look at the statistics I can see that the total number of rows is wrong:
Unfortunately, if I write the following test, it passes, which means that my Root Aggregate entity is wrongly loaded:
// wrong because address is only 1 expectedPerson.Addresses .Count.Should().Be(2, "There is only one address"); expectedPerson.Phones .Count.Should().Be(2, "There are two phones");
The solution is to batch the collections into two different query, without affecting too much the Database performances. In order to achieve this goal I have to use the Future syntax and tell to NHibernate to build a Root Aggregate with three database batch calls:
// create the first query var query = session.Query<Person>() .Where(x => x.Id == person.Id); // batch the collections query .FetchMany(x => x.Addresses) .ToFuture(); query .FetchMany(x => x.Phones) .ThenFetch(p => p.PhoneType) .ToFuture(); // execute the queries in one roundtrip Person expectedPerson = query.ToFuture().ToList().First();
Now if I profile my query, I can see that the entities loaded are loaded using 3 SQL queries but batched together into one single database call:
Regarding the performances, this is the difference between a Cartesian product and a Batch call:
NHibernate QueryOver
The same mechanism is available also for the QueryOver<T> component, we can instruct NHibernate to create a left outer join, and get a Cartesian product, like the following statement:
Phone phone = null; PhoneType phoneType = null; // One query Person expectedPerson = session.QueryOver<Person>() // Inner Join .Fetch(p => p.Addresses).Eager // left outer join .Left.JoinAlias(p => p.Phones, () => phone) .Left.JoinAlias(() => phone.PhoneType, () => phoneType) .Where(x => x.Id == person.Id) .TransformUsing(Transformers.DistinctRootEntity) .List().First();
As you can see here I am trying to apply the transformer DistinctRootEntity, but unfortunately the transformer does not work if you eager load more than 1 child collection, because the Database returns more than 1 instance of the same Root Aggregate.
Also in this case, the alternative is to Batch the collections and send 3 queries to the Database in one round trip:
Phone phone = null; PhoneType phoneType = null; // prepare the query var query = session.QueryOver<Person>() .Where(x => x.Id == person.Id) .Future(); // eager load in one batch the first collection session.QueryOver<Person>() .Fetch(x => x.Addresses).Eager .Future(); // second collection with grandchildren session.QueryOver<Person>() .Left.JoinAlias(p => p.Phones, () => phone) .Left.JoinAlias(() => phone.PhoneType, () => phoneType) .Future(); // execute the query Person expectedPerson = query.ToList().First();
Personally, the only thing that I don’t like about QueryOver<T> is the syntax, as you can see from my complex query I need to create some empty pointers to the object Phone and PhoneType. I don’t like it because when I batch 3, 4 collections I always come up with 3, 4 variables that are quite ugly and useless.
NHibernate HQL
HQL is a great query language, it allows you to really write any kind of complex query and the biggest advantage, compared to Linq or QueryOver<T> is the fully support by the framework.
The only downside is that it requires “magic strings”, so you must be very careful on what query you write because it is very easy to write wrong queries and get a nice runtime exception.
So, also in this case, I can eager load everything in one shot, and get again a Cartesian product:
Person expectedPerson = session.CreateQuery(@" from Person p left join fetch p.Addresses a left join fetch p.Phones ph left join fetch ph.PhoneType pt where p.Id = :id") .SetParameter("id", person.Id) .List<Person>().First();
Or batch 3 different HQL query in one Database call:
// prepare the query var query = session.CreateQuery("from Person p where p.Id = :id") .SetParameter("id", person.Id) .Future<Person>(); // eager load first collection session.CreateQuery("from Person p left join fetch p.Addresses a where p.Id = :id") .SetParameter("id", person.Id) .Future<Person>(); // eager load second collection session.CreateQuery("from Person p left join fetch p.Phones ph left join fetch ph.PhoneType pt where p.Id = :id") .SetParameter("id", person.Id) .Future<Person>();
Eager Load vs Batch
Actually I run some tests in order to understand if the performances are better by:
- Running an eager query and clean manually the duplicated records
- Run a batch set of queries and get a clean Root Aggregate
These are my results:
And surprisingly the eager load + C# cleanup is slower than the batch call.