Option Tuning
When performing bulk operations, having a robust set of configurable options is essential. These shared bulk options allow you to define the batch size, handle errors, manage identity inserts, and track operation progress among others, ensuring a seamless data processing experience. While most options are common across all bulk operations, some are unique to specific tasks. For instance, there are exclusive settings for BulkMerge operations, allowing fine-tuned control over merging and caching behavior. On the flip side, BulkInsert, BulkUpdate, and BulkDelete also have distinct options enabling pre and post batch processing adjustments.
Using options
In each bulk operation, you can configure various options by passing an Action<BulkOptions>
(or Action<MergeBulkOptions
) to the method. The example demonstrates setting multiple options within the BulkInsert
operation. Each option is set within the lambda expression, allowing tailored behavior like specifying the BatchSize
or action to track progress.
db.BulkInsert(data, options => {
options.BatchSize = 2;
options.OnProgress = _ => progress_count++;
});
Shared options
The following options are relevant for all bulk operations.
Batch Size
The BatchSize
option determines the number of records processed in a single batch. Setting this to 10000, as shown, instructs BulkInsert
to handle data in chunks of 10000 records per batch, optimizing performance when managing hefty datasets. By default, BatchSize
is set to 1000.
db.BulkInsert(data, options => {
options.BatchSize = 10000;
});
Tracking progress
With the OnProgress
option in EFBox, keep a tab on the progress of your bulk insert operation. This example demonstrates how you can increment a counter each time a batch is processed, providing a simple way to track how many batches have been handled during the operation. It’s a handy feature for logging or user feedback during large bulk operations.
db.BulkInsert(data, options => { options.OnProgress = _ => count++; });
Data Transformation with ColumnConverters
The ColumnConverters
option in EFBox allows you to apply transformations to data during bulk operations. In this example, a ColumnConverter
is defined to prepend “https://” to the Url
property of each record before it’s inserted into the database. This is a powerful feature for applying quick data transformations without the need for manual data manipulation before executing the bulk operation.
db.BulkInsert(data, options => {
options.ColumnConverters = new[] {
new ColumnConverter() {
ColumnName = "Url",
ConversionFunc = u => "https://"+u.ToString()
}
};
});
Skipping Back-Read of Generated Values
The ReadGeneratedValues
option allows you to control whether to read back generated values like auto-incremented IDs from the database after a bulk operation. By default, this option is set to true
. However, in scenarios where reading back generated values is not required, setting ReadGeneratedValues
to false
as shown in the example can enhance performance by reducing the back-and-forth communication with the database.
db.BulkInsert(data, options => options.ReadGeneratedValues = false);
Enabling Identity Insert
The AllowIdentityInsert
option allows you to override existing values in identity/auto-increment/serial columns during a bulk operation. By default, this option is set to false
. When set to true
as shown in the example, it enables identity insert, permitting the insertion of explicit values into the identity column of a table. This is useful when you need to keep identity values consistent between your data source and the database.
db.BulkInsert(data, options => options.AllowIdentityInsert = true);
Redirecting Erroneous Batches
The RedirectErroneousBatches
option, when set to true
, allows the continuation of the bulk operation even when a batch encounters an error. Instead of halting the operation, the erroneous batches are redirected, preventing an exception from being thrown. The BatchSize
is set to 2
in this example, defining the number of records to be processed in a single batch. This way, if an error occurs, the operation won’t stop, and you can later analyze what went wrong with the particular batches.
List<ETLBoxError> errors = new List<ETLBoxError>();
db.BulkInsert(data, options => {
options.BatchSize = 2;
options.RedirectErroneousBatches = true;
});
Options for insert/update/delete
The following options are only relevant for the BulkInsert
/BulkUpdate
/BulkDelete
operations.
Before/After BatchWrite
Utilize the BeforeBatchWrite
and AfterBatchWrite
options in EFBox to interact with your data right before and after it’s written in each batch during a bulk insert operation. In this example, BeforeBatchWrite
is used to modify data if the batch size is greater than 2, while AfterBatchWrite
helps in aggregating the count of processed records. Through these options you can access each batch of data before or after the bulk operation.
db.BulkInsert(data, options => {
options.AfterBatchWrite = b => {
count += b.Length;
};
options.BeforeBatchWrite = b => {
if (b.Length > 2) {
b.ElementAt(0).Url = "test1";
b.ElementAt(1).Url = "test2";
}
return b;
};
});
Options for merge
The following options are only relevant for the BulkMerge
.
Changing merge mode
By default, MergeMode.Full
is selected, enabling insert, update, and delete operations to synchronize data. Here are the available merge modes:
Delta
: Performs inserts and updates, with deletions only marked with a flag.Full
: Executes inserts, updates, and deletions (deletions if a record is missing).InsertsAndUpdates
: Allows inserts and updates only, ignoring deletion flags.InsertsOnly
: Solely performs inserts, no updates or deletions.UpdatesOnly
: Only allows updates, no inserts or deletions.
Setting Cache Modes
In this example, the MergeMode
is set to Delta
and CacheMode
is set to Partial
using the options parameter in the BulkInsert
method. By default, the bulk merge will load all data from the destination into memory first before syncing data. If the target table is quite large and you wish to avoid this, consider setting the CacheMode
to Partial
. However, note that this will only work for merge modes other than MergeMode.Full
, as in Full
mode all data needs to be loaded into memory to detect the missing data (the deletes), balancing between performance and memory consumption.
db.BulkInsert(data, options => {
options.MergeMode = MergeMode.Delta;
options.CacheMode = CacheMode.Partial;
});
Customizing columns for Merge Operations
CompareColumns
: Define properties to check for equality during merge to decide if an update is needed.UpdateColumns
: Specify properties to be updated when a row needs updating, by default all non-ID columns are updated.DeleteColumns
: List property names along with a specific value indicating if a row is eligible for deletion during the merge operation. This option provides a way to mark rows for deletion based on certain criteria. Note: This option is only applicable when the MergeMode is set to Delta.
In merge operations, the property name is typically mapped to the column name as per Entity Framework conventions. However, you may need to specify which properties to use for comparing, updating, or deleting records. These options provide that flexibility.
var compareColumns = new[] {
new CompareColumn() {
ComparePropertyName= nameof(Blog.Url),
}
};
var updateColumns = new[] {
new UpdateColumn() {
UpdatePropertyName= nameof(Blog.Rating),
}
};
var deleteColumns = new[] {
new DeleteColumn() {
DeletePropertyName = nameof(Blog.BlogId),
DeleteOnMatchValue = 2
}
};
db.BulkMerge(data, options => {
options.MergeMode = MergeMode.Delta;
options.UpdateColumns = updateColumns;
options.CompareColumns = compareColumns;
options.DeleteColumns = deleteColumns;
});
In the provided example, the CompareColumns
is set to compare the Url
property, UpdateColumns
is set to update the Rating
property, and DeleteColumns
is set to delete a row if the BlogId
property matches the value 2. These configurations are utilized by the BulkMerge
operation to determine how to synchronize the data with the database.