Skip to main content

Processing high-volume data in .NET Entity Framework

 The efficiency of DDL and DML operations on data within Object-Relational Mapping (ORM) systems has become a prominent concern in the contemporary development landscape. Entity Framework stands out as a widely-used ORM within the .NET ecosystem. In specific scenarios, business requirements necessitate retrieving or doing some processes with large datasets, often involving millions of records in SQL Server databases. In light of this challenge, I have chosen to share my insights and experiences.

Firstly, we should understand the coroutine, then we will discuss IAsyncEnumerable and IEnumerable, and finally, we will discuss approaches to handling huge numbers of data.

What is coroutine?

From the Educative website

The word “coroutine” is composed of two words: “co” (cooperative) and “routines” (functions).

In .NET, a coroutine is a function that can be suspended and resumed later. The .NET Framework keeps track of the state of a coroutine using a data structure called a coroutine frame. The CLR handles stack switching in coroutines.

This is done using the yield return statement. The yield return statement takes a value, which is the result of the suspended execution. When the coroutine is resumed, the value is returned to the caller.

Here is an example of a coroutine in .NET Core:

async Task<int> GetNumber() {
// Suspend the execution of the coroutine and wait for 1 second.
await Task.Delay(1000);

// Return the number 10.
return 10;
}

IEnumerable vs IAsyncEnumerable

IAsyncEnumerable and IEnumerable are both interfaces in the .NET Framework that represent an iterable collection of elements. However, there are some critical differences between the two interfaces.

  • IAsyncEnumerable is an asynchronous interface, while IEnumerable is a synchronous interface. This means that an IAsyncEnumerable can be used to iterate over a collection of elements asynchronously, while an IEnumerable can only be used to iterate over a collection of elements synchronously.
  • IAsyncEnumerable uses the yield keyword to return elements, while IEnumerable uses the GetEnumerator() method to return an enumerator. The yield keyword allows an IAsyncEnumerable to return elements one at a time, without having to wait for all of the elements to be available.
  • IAsyncEnumerable is typically used for streaming data, while IEnumerable is typically used for working with in-memory collections. This is because IAsyncEnumerable can be used to iterate over a collection of elements as they become available, while IEnumerable requires the entire collection of elements to be available before it can be iterated over.

IAsyncEnumerable and IEnumerable are both interfaces in the .NET Framework that represent an iterable collection of elements. However, there are some critical differences between the two interfaces.

  • IAsyncEnumerable is an asynchronous interface, while IEnumerable is a synchronous interface.
  • IAsyncEnumerable uses the yield keyword to return elements, while IEnumerable uses the GetEnumerator() method to return an enumerator.
  • IAsyncEnumerable is typically used for streaming data, while IEnumerable is typically used for working with in-memory collections.

In general, you should use IAsyncEnumerable when you need to iterate over a collection of elements asynchronously, or when you need to stream data. You should use IEnumerable when you need to iterate over a collection of elements synchronously, or when you are working with an in-memory collection.

Here is an example of how to use IAsyncEnumerable:

async Task DoSomethingAsync() {
// Create an IAsyncEnumerable that will yield the numbers from 1 to 10.
var numbers = Enumerable.Range(1, 10).ToAsyncEnumerable();

// Iterate over the numbers asynchronously.
await foreach (int number in numbers) {
Console.WriteLine(number);
}
}

In this example, the ToAsyncEnumerable() method is used to convert an IEnumerable to an IAsyncEnumerable. The foreach loop then iterates over the IAsyncEnumerable asynchronously, printing each number to the console.

Streaming Data in Entity Framework

Streaming data in EF Core is a way to return data from a database one row or a set of rows at a time, instead of loading the entire result set into memory. This can be useful for large result sets, as it can prevent the application from running out of memory.

To stream data in EF Core, you can use the AsAsyncEnumerable() method. This method returns an IAsyncEnumerable, which is an asynchronous interface that can be used to iterate over the data one row or a set of rows at a time.

Here is an example of how to stream data in EF Core:

async Task DoSomethingAsync() {
// Create an IAsyncEnumerable that will stream the data from the database.
var data = await context.Products.AsAsyncEnumerable();

// Iterate over the data asynchronously.
await foreach (var product in data) {
Console.WriteLine(product.Name);
}
}

In this example, the AsAsyncEnumerable() method is used to convert an IEnumerable to an IAsyncEnumerable. The foreach loop then iterates over the IAsyncEnumerable asynchronously, printing the name of each product to the console.

This example shows how to fetch records with the streaming technique and update data by coroutine with Entity Framework. It can be used to process a million records and then return to the other methods.

Before anything else, I want to stream millions of records to do some jobs on them.

    public async IAsyncEnumerable<List<Product>> StreamingProducts()
{
var totalCount = await await context.Products.CountAsync();

for (var i = 0; i <= Math.Ceiling(totalCount / Convert.ToDecimal(DefaultPageSize)); i++)
yield return await await context.Products.AsNoTracking()
.Skip((i) * DefaultPageSize).Take(DefaultPageSize).ToListAsync();
}

This method returns an IAsyncEnumerable of List of ProductEntity. It first gets the total number of records in the Products table. Then, it iterates over the records, skipping the first i * DefaultPageSize records and taking the next DefaultPageSize records. It then returns one List of the records.

Then, I start my parallel operation on the dataset.

    public async Task UpdateSomethingInProduct()
{

var result =StreamingProducts().GetAsyncEnumerator();

using (var transactionScope = new TransactionScope())
{
while (await result.MoveNextAsync())
{
var data = result.Current

await Parallel.ForEachAsync(data, async (item, token) =>
{
// do somethings
// use transactionScope.Entry to get the current state of the record before updating it.
await transactionScope.Entry(item).StateAsync().ConfigureAwait(false);
await transactionScope.SaveChangesAsync();
});
}

transactionScope.Complete();
}
}

This method updates the records in the Products table in parallel. It first gets one IAsyncEnumerator of the records. Then, it starts TransactionScope and iterates over the records, updating each record in parallel. When the iteration is finished, it commits the transaction.

It is important to note that you should use the StateAsync() method to get the current state of the record before updating it. This is because the record may be updated by another coroutine or thread in the meantime.

Caution: coroutines can lead to concurrency problems. If you want to run coroutines in parallel, you must manage your own transactions in parallel.

Conclusion

In this text, In this text, I have discussed the use of coroutines and streaming data in EF Core. I believe that these technologies can be used to improve the performance and scalability of applications that work with large data sets.

In my opinion, the use of coroutines is a powerful tool that can be used to write asynchronous code that is more concise and readable than traditional asynchronous code. I believe that coroutines can be a valuable asset for developers who are working with large result sets.

I also think that streaming data is a promising technology that can be used to improve the performance and scalability of applications that work with large data sets. I believe that streaming data will become increasingly important in the future as the amount of data that we generate and consume continues to grow.

Here are some of the benefits of streaming data in EF Core:

  • It can prevent the application from running out of memory.
  • It can improve the performance of the application, as the data is not loaded into memory all at once.
  • It can be used to stream data to a client, such as a web browser or mobile device.

If you are working with significant result sets in EF Core, I recommend using streaming data. It can help you to improve the performance and scalability of your application.

Comments

Popular posts from this blog

Job pause and resume on Hangfire on Execution mode

  A common approach to handling tasks automatically in your application is to use jobs. Hangfire is an excellent tool for managing jobs easily. The stopping and resuming of jobs in Hangfire sometimes require stopping and restarting your project, so I decided to introduce new components in this framework after reviewing comments and advice carefully. in this article, I will describe how you can implement this component in your Hangfire Project.