Resources Blog

How to Process Data in Salesforce Efficiently

Odaseva

Feb 20, 2024

This post was adapted from the Odaseva Data Innovation Forum session, “How to Efficiently Process Data in Salesforce” featuring Sarath Yalavarthi, CTA, Application Owner for Consumer Banking CRM at JPMorgan Chase & Co.

Watch the session replay here and see a clip below.

In today’s dynamic digital landscape, data reigns supreme. Efficiently processing data in Salesforce is crucial. This post reviews Sarath Yalavarthi’s presentation from the Odaseva Data Innovation Forum for Salesforce Architects about the three pivotal operations that are integral to efficient processing of data in Salesforce:

  1. Data injection: How data is collected or inserted into Salesforce
  2. Data retrieval: How data is extracted or queried from Salesforce for various purposes, such as reporting, analysis, and sharing
  3. Data transformation: The process of processing data to clean, validate, and aggregate it, ensuring it is in the right format and quality for its intended use

Below are the considerations and approaches for each:

Data injection

Data injection is the process of collecting and inserting data from various sources into Salesforce.

Considerations

  • Volume

It’s important to take into consideration the volume of records: is it a few records, thousands or millions of records?

  • Complexity of the data set

The other important thing that needs to be taken into consideration is the complexity of the data set, whether it’s a few picklists or files or complete data sets. 

  • Frequency

How frequent is the data load? Is it a one-time data load, or is it every day, monthly or weekly?

  • Source of data

Is the data that’s being sourced from an ERP, a live stream, or spreadsheets?

The approach

The approach to data injection are as follows:

  • Manual data entry

Manual data entry is suitable for limited use cases or instances where data entry is performed directly by end users, particularly in small-scale scenarios.

  • Data Import Wizard

The data import wizard supports a few objects and can load up to 50,000 records. For example, custom objects, accounts, contacts, leads, and campaign members are supported, but opportunities or opportunity products are not. 

  • Data Loader

The Data Loader allows for data import and export and is efficient when handling datasets below 5 million records. So if the requirement is to upload or download a million records in a single operation, utilizing the data loader is recommended.

  • API (REST, SOAP and Bulk APIs)

The Salesforce API tool allows users to interact with Salesforce programmatically. 

With this tool, the user can write code to create records, update records, or use Bulk API to extract huge volumes of data. This is the right tool for large datasets and complex data integrations.

  • ETL Tools

ETL tools are the right tools for daily deltas, take backups, or perform any complex integrations.

Data retrieval

The considerations for data retrieval stay the same as data injection, which are: 

  • Volume
  • Complexity of data set
  • Frequency
  • Destination of data

The approach

The approach for data retrieval is similar to data injection as well, except for:

  • Reports

If datasets containing less than 2,000 records need to be exported, it is possible to create a report and download the data.

  • Weekly Data Export

With the Weekly Daily Export, backups can be scheduled and made available for download. The entire Salesforce dataset can be obtained by scheduling downloads, organized into multiple CSV files, each with a file size limit of 512 MB. 

Data Loader, API, and ETL Tools approaches are the same as they are for data injection. 

Data transformation

The considerations remain the same for data transformation as well:

The approach

The approach is determined by the number of records and the use case. Let’s take a look at what those use cases are:

  • Flow

Flows are of different kinds: there are before-save record trigger flow, after-save record trigger flow, event-scheduled path flows, and scheduled flows. Flows are good for making medium changes. For example, a scheduled path flow can process up to 250k records, so for a small use case, flow is a good option. 

  • Triggers

For use cases that involve very complex calculations, triggers are used. Triggers are event-driven so whenever a record is created, updated, or deleted, then actions like updating or creating auxiliary records can be performed. Triggers are very efficient and can process complex logic, making it the right tool for real time data transformation.

  • Formulas

Formulas are easy to use, calculated in runtime, and do not take up database space. 

  • Apex Batch Job

Apex Batch Job is the right tool to use for running transformations periodically. It is capable of processing 50 million records at a time. Since most implementations are fewer than 50 million records, the Apex Batch Job is an ideal tool. It can schedule to run during non-business hours so that it doesn’t impact work during business hours.

  • Functions

For implementations with more than 50 million records, you can scale on-demand with Functions. 

  • Data Cloud

Salesforce Data Cloud, a relatively new feature, allows Salesforce to store and process data outside of Salesforce. The Data Cloud can be used to transform and process huge volumes of data without impacting the core Cloud. 

Upon completing data transformation, users can seamlessly utilize the records through currently exposed features, such as Cloud enrichment and various others, which have become generally available in the latest release. This marks a valuable and noteworthy addition to the functionality.

  • ETL

If there is a need to process a dataset of, say, 200 million records, and an existing ETL solution is in place for performing aggregate calculations, it is recommended to utilize the ETL to extract the necessary information. Conduct all the required data transformation and load the information into the aggregated objects. 

The impact and the mitigations of Large Data Volumes

If a particular table or an object has more than 5 million records, then that is generally considered a Large Data Volume (LDV).

How LDV impacts reports, queries and list views

LDV can significantly impact reports, queries, and list views – it can either time out or data loading times can take very long. It can also cause skews, which can be lookup skews or master-detail skews, resulting in substantial delays when updating child records. Summary fields can become a challenging task if there are more than 10,000 or 20,000 records, leading to record locking. Sharing calculations, especially with large data volumes, contribute to considerable time consumption, ultimately diminishing the overall system performance. Moreover, if there is too much data, it would incur additional costs from Salesforce, and may also lead to security issues.

Dealing with LDV requires a mitigation plan:

  • Skinny tables

Skinny tables serve as an efficient means to access predefined data. For instance, consider an object such as an account with 200 fields. When a specific report necessitates only 10 fields, working with Salesforce support can help create a skinny table for the 10 fields. Consequently, whenever a report, query, or list view involving these fields is used, it will be very swift and efficient. It’s important to note that this feature is not universally supported in all objects. It is supported only in custom objects, accounts, contacts, and a limited set of others.

  • Indexes

The standard index is out-of-the-box pre-configured and enabled on specific fields like all ID fields, created by date, and several other fields. In the absence of a standard index, collaboration with the support team can help create a custom index through indexed queries. This operation proves to be highly efficient, similar to the functionality of SQL or Oracle indexes.

  • Divisions

Divisions can be used to achieve a form of horizontal partitioning of the data, so that the dataset being queried is low. 

  • Data archival

Use only the active data in Salesforce and archive the remaining data. If there is a need to access the archived data, external objects can be used in conjunction with the Salesforce Data Cloud. 

Roadblocks and the path forward while processing billions of records

Roadblocks

While processing LDVs, roadblocks may arise such as row locks and hitting governor limits like the maximum number of bulk jobs that can be executed in a single day. Given the number of bulk jobs, even after submitting them, there may still be many bulk jobs waiting in the queue. High database CPU utilization can prompt Salesforce to throttle the entire Org, resulting in frequent timeouts. 

Path forward

To overcome these roadblocks, the following are what can be done to chart the path forward:

  • Use latest version of Bulk API V2

To ensure the smooth progression and processing of all records, the most straightforward and efficient approach involves leveraging the latest Bulk API version. Even for those utilizing V2, it is crucial to use the latest version to capitalize on the additional capacity and capabilities that have been introduced.

  • Work with Salesforce support

Leveraging Salesforce support is yet another strategy. This requires some advance planning as it may need a month or two. This involves securing an increase in the Bulk API batch limit and also increasing the concurrent batch processing limit. Caution is advised, as excessive increments may lead to increased database CPU utilization, resulting in throttling. This is why it is important to thoroughly test the full copy Sandbox to determine the optimal concurrent batch limit specific for Org requirements.

  • Bypass automation 

A bypass strategy needs to be implemented that will bypass all the automation and validations for the ETL user – this is hugely beneficial as it will make data insertion very fast.

  • Defer sharing rules

For processing LDVs, it is advisable to defer the sharing rules. Running this process during off-business hours ensures that, shortly after completion, the sharing rules can be initiated quickly.

  • Data Cloud

Data Cloud is highly beneficial, particularly when dealing with millions or billions of records.The Data Cloud enables the user to perform LDV operations more efficiently.

  • Maximizing Efficiency with Bulk API V2

Bulk API V2 efficiently manages batch operations and handles retries up to 15 times. When using Bulk API V2, only ingests and updates contribute to the batch count. Query jobs are not counted so while executing a query with 5,000 batches, they are counted against the daily limit of 15,000 batches. However, with V2, for the same query, there is a limit on the number of queries that can be done in a day (approximately 10,000), but imposes no limit on the number of batches consumed.

Additionally, there exists a limit on the total results that Bulk API V2 can extract in 24-hours; it is set at 1 TB. This, however, isn’t a strict limit. If there is going to be heightened data processing during specific periods, with Salesforce support’s help, this limit can be increased. Retries are taken care of, batch management is taken care of, and they are not counted against your limits.

Want to learn about more  ideas, opportunities, and strategies to maximize the value of Salesforce data? Watch sessions from the 2023 Data Innovation Forum for Salesforce Architects here.

Close Bitnami banner
Bitnami