Resources
Blog

How to Process Data in Salesforce Efficiently

February 20, 2024
February 20, 2024

This post was adapted from the Odaseva Data Innovation Forum session, “How to Efficiently Process Data in Salesforce” featuring Sarath Yalavarthi, CTA, Application Owner for Consumer Banking CRM at JPMorgan Chase & Co.

Watch the session replay here and see a clip below.

In today’s dynamic digital landscape, data reigns supreme. Learning how to efficiently process data in Salesforce, including data retrieval, data injection, and transformation, is crucial for enterprise operations. 

Efficiently processing data in Salesforce is crucial. This post reviews Sarath Yalavarthi’s presentation from the Odaseva Data Innovation Forum for Salesforce Architects about the three pivotal operations that are integral to efficient processing of data in Salesforce:

  1. Data injection: How data is collected or inserted into Salesforce
  2. Data retrieval: How data is extracted or queried from Salesforce for various purposes, such as reporting, analysis, and sharing
  3. Data transformation: The process of processing data to clean, validate, and aggregate it, ensuring it is in the right format and quality for its intended use

 Key Takeaways:

  • Choose the right data retrieval method based on volume. Reports work for under 2K records, while Bulk API V2 handles millions
  • Scale data injection approaches from Data Import Wizard (50K records) to ETL tools and data injection software for complex integrations
  • Select data transformation tools by use case. Flows process 250K records, Apex Batch handles 50M records, and Data Cloud manages billions

Below are the considerations and approaches for each:

How to Inject Data into Salesforce

Understanding how to inject data into Salesforce starts with evaluating your requirements. The right data injection software and approach depends on several factors:

Considerations

Volume

  • It’s important to take into consideration the volume of records: is it a few records, thousands or millions of records?

Complexity of the data set

  • The other important thing that needs to be taken into consideration is the complexity of the data set, whether it’s a few picklists or files or complete data sets. 

Frequency

  • How frequent is the data load? Is it a one-time data load, or is it every day, monthly or weekly?

Source of data

  • Is the data that’s being sourced from an ERP, a live stream, or spreadsheets?

Data Injection Methods

The approach to data injection are as follows:

Manual data entry

  • Manual data entry is suitable for limited use cases or instances where data entry is performed directly by end users, particularly in small-scale scenarios.

Data Import Wizard

  • The data import wizard supports a few objects and can load up to 50,000 records. For example, custom objects, accounts, contacts, leads, and campaign members are supported, but opportunities or opportunity products are not. 

Data Loader

  • The Data Loader allows for data import and export and is efficient when handling datasets below 5 million records. So if the requirement is to upload or download a million records in a single operation, utilizing the data loader is recommended.

API (REST, SOAP and Bulk APIs)

  • The Salesforce API tool allows users to interact with Salesforce programmatically for automated data injection.
  • With this tool, the user can write code to create records, update records, or use Bulk API to extract huge volumes of data. This is the right tool for large datasets and complex data integrations.

ETL Tools

  • ETL tools are the right tools for daily deltas, take backups, or perform any complex integrations.

How to Retrieve Data from Salesforce

Knowing how to retrieve data from Salesforce efficiently requires matching your method to your needs. Data retrieval software options range from simple exports to enterprise ETL solutions.

The considerations for data retrieval stay the same as data injection, which are:

  • Volume
  • Complexity of data set
  • Frequency
  • Destination of data

Data Retrieval Methods

The approach for data retrieval is similar to data injection as well, except for:

Reports

For straightforward data retrieval needs involving datasets containing less than 2,000 records, you can create a report and download the data directly.

Weekly Data Export

The entire Salesforce dataset can be obtained by scheduling downloads, organized into multiple CSV files, each with a file size limit of 512 MB. Data Loader, API, and ETL Tools approaches are the same as they are for data injection. 

How to Transform Data in Salesforce

Data transformation in Salesforce involves processing data to clean, validate, and aggregate it. Choosing the right data transformation software depends on your record volume and business requirements.

The considerations remain the same for data transformation as well.

Data Transformation Tools by Volume

The approach is determined by the number of records and the use case. Let’s take a look at what those use cases are:

Flow

  • Flows are of different kinds: there are before-save record trigger flow, after-save record trigger flow, event-scheduled path flows, and scheduled flows. Flows are good for making medium changes. For example, a scheduled path flow can process up to 250k records, so for a small use case, flow is a good option. 

Triggers

  • For use cases that involve very complex calculations, triggers are used. Triggers are event-driven so whenever a record is created, updated, or deleted, then actions like updating or creating auxiliary records can be performed. Triggers are very efficient and can process complex logic, making it the right tool for real time data transformation.

Formulas

  • Formulas are easy to use, calculated in runtime, and do not take up database space. 

Apex Batch Job

  • Apex Batch Job is the right tool to use for running transformations periodically. It is capable of processing 50 million records at a time. Since most implementations are fewer than 50 million records, the Apex Batch Job is an ideal tool. It can schedule to run during non-business hours so that it doesn’t impact work during business hours.

Functions

  • For implementations with more than 50 million records, you can scale on-demand with Functions. 

Data Cloud

  • Salesforce Data Cloud, a relatively new feature, allows Salesforce to store and process data outside of Salesforce. The Data Cloud can be used to transform and process huge volumes of data without impacting the core Cloud. 
  • Upon completing data transformation, users can seamlessly utilize the records through currently exposed features, such as Cloud enrichment and various others, which have become generally available in the latest release. This marks a valuable and noteworthy addition to the functionality.

ETL

  • If there is a need to process a dataset of, say, 200 million records, and an existing ETL solution is in place for performing aggregate calculations, it is recommended to utilize the ETL to extract the necessary information. Conduct all the required data transformation and load the information into the aggregated objects. 

How to Handle Large Data Volumes in Salesforce

If a particular table or an object has more than 5 million records, then that is generally considered a Large Data Volume (LDV).

Large Data Volume Performance Impacts

LDV can significantly impact reports, queries, and list views - it can either time out or data loading times can take very long. It can also cause skews, which can be lookup skews or master-detail skews, resulting in substantial delays when updating child records. Summary fields can become a challenging task if there are more than 10,000 or 20,000 records, leading to record locking. 

Sharing calculations, especially with large data volumes, contribute to considerable time consumption, ultimately diminishing the overall system performance. Moreover, if there is too much data, it would incur additional costs from Salesforce, and may also lead to security issues.

Large Data Volume Mitigation Strategies

Skinny tables

  • Skinny tables serve as an efficient means to access predefined data. For instance, consider an object such as an account with 200 fields. When a specific report necessitates only 10 fields, working with Salesforce support can help create a skinny table for the 10 fields. Consequently, whenever a report, query, or list view involving these fields is used, it will be very swift and efficient. It's important to note that this feature is not universally supported in all objects. It is supported only in custom objects, accounts, contacts, and a limited set of others.

Indexes

  • The standard index is out-of-the-box pre-configured and enabled on specific fields like all ID fields, created by date, and several other fields. In the absence of a standard index, collaboration with the support team can help create a custom index through indexed queries. This operation proves to be highly efficient, similar to the functionality of SQL or Oracle indexes.

Divisions

  • Divisions can be used to achieve a form of horizontal partitioning of the data, so that the dataset being queried is low. 

Data archival

  • Use only the active data in Salesforce and archive the remaining data. If there is a need to access the archived data, external objects can be used in conjunction with the Salesforce Data Cloud. 

How to Overcome Data Processing Challenges

Common Data Processing Roadblocks

While processing LDVs, roadblocks may arise such as row locks and hitting governor limits like the maximum number of bulk jobs that can be executed in a single day. Given the number of bulk jobs, even after submitting them, there may still be many bulk jobs waiting in the queue. High database CPU utilization can prompt Salesforce to throttle the entire Org, resulting in frequent timeouts. 

Solutions for Processing Large Datasets

To overcome these roadblocks, the following are what can be done to chart the path forward:

Use latest version of Bulk API V2

  • To ensure the smooth progression and processing of all records, the most straightforward and efficient approach involves leveraging the latest Bulk API version. Even for those utilizing V2, it is crucial to use the latest version to capitalize on the additional capacity and capabilities that have been introduced.

Work with Salesforce support

  • Leveraging Salesforce support is yet another strategy. This requires some advance planning as it may need a month or two. This involves securing an increase in the Bulk API batch limit and also increasing the concurrent batch processing limit. Caution is advised, as excessive increments may lead to increased database CPU utilization, resulting in throttling. This is why it is important to thoroughly test the full copy Sandbox to determine the optimal concurrent batch limit specific for Org requirements.

Bypass automation 

  • A bypass strategy needs to be implemented that will bypass all the automation and validations for the ETL user - this is hugely beneficial as it will make data insertion very fast.

Defer sharing rules

  • For processing LDVs, it is advisable to defer the sharing rules. Running this process during off-business hours ensures that, shortly after completion, the sharing rules can be initiated quickly.

Data Cloud

  • Data Cloud is highly beneficial, particularly when dealing with millions or billions of records.The Data Cloud enables the user to perform LDV operations more efficiently.

How to Maximize Efficiency with Bulk API V2

Bulk API V2 efficiently manages batch operations and handles retries up to 15 times. When using Bulk API V2, only ingests and updates contribute to the batch count. Query jobs are not counted so while executing a query with 5,000 batches, they are counted against the daily limit of 15,000 batches. However, with V2, for the same query, there is a limit on the number of queries that can be done in a day (approximately 10,000), but imposes no limit on the number of batches consumed.

Additionally, there exists a limit on the total results that Bulk API V2 can extract in 24-hours; it is set at 1 TB. This, however, isn't a strict limit. If there is going to be heightened data processing during specific periods, with Salesforce support’s help, this limit can be increased. Retries are taken care of, batch management is taken care of, and they are not counted against your limits.

Want to learn about more  ideas, opportunities, and strategies to maximize the value of Salesforce data? Watch sessions from the 2023 Data Innovation Forum for Salesforce Architects here.

View other stories

How AI is Impacting the Way Solutions are Architected and Delivered

Indonesia Data Breach: A Wake-Up Call for SaaS Data Security

Salesforce Data Mastery Q&A: How to Build for Large Data Volumes Like a Master Architect