2366796 - Best Practices for Modeling and Handling Data Sources and Reports

Symptom

You are creating many data sources and reports and you would like to know what the best practices are to ensure the best performance. This is especially relevant if you handle a large data volume.

Please note: Generally, the performance of data sources and reports cannot be compared between 2 systems, even if the data source / report is the same. There are many factors that influence the performance. Below you will find some guidelines on the relevant data source / report settings.

Environment

SAP Business ByDesign
SAP Cloud for Customer

Reproducing the Issue

Cause

Resolution

Best Practices for Modeling Data Sources

Fundamental understanding:

An inner join returns only those rows from the left table that have a matching row in the right table based on the join criteria.
A left outer join returns all rows from the left table even if no matching rows where found in the right table. Any values selected out of the right table will be null for those rows where no matching row is found in the right table.

For a detailed explanation of the join types, please see the help document called Data Sources.

A Cloud Data Source can be used in order to upload external data into your solution. Please note the following restrictions for cloud data sources:

No access context is currently supported for cloud data sources, so the access context will always be Unrestricted. This means that whoever has access to reports based on this data source will see the complete data set.
If you use a cloud data source as part of a join and then create a report based on this join, the Add Fields button will not show the fields from the cloud data source.

Best Practices:

Combined data sources make sense where all involved data sources have many characteristics in common, but have different key figures. If there are only very few matches between the characteristics, a join might return better results. For example, you would like to see the combined revenue of a company, irrespective of whether it is coming from sales orders or service orders. So you can create a combined data source on sales orders and service orders to see the aggregated revenue of a company.
Combined data sources tend to be better in terms of performance. In a join, the data volume is multiplied MxN, depending on the join conditions. Hence it is also advisable to reduce the number of join conditions if possible.
When joining data sources, inner joins are the preferred approach.
You can join up to 5 data sources, but it is recommended to keep the number of joined data sources low.
If left outer joins are unavoidable, please note the following:

The data is fetched from the left to the right join members.
For this reason, to improve performance, the anchor of your data source should be the one containing least records.
If there are more than 2 join members, sort them by increasing data volume, i.e. the smallest data source should be the anchor, the next bigger one should be to the right of it, and the largest data source should be to the very right of the table.
For performance, it is preferable to have a master data data source as an anchor, instead of a transactional data source.

Avoid numerous "layers" of data sources, for example scenarios like the example below. These are complex joins and their calculation may take a very long time.

Join 1: several standard data sources are joined
Join 2: Join 1 and one or several standard data sources are joined
Join 3: Join 2 and more standard or customized data sources are joined
and so on...
Such scenarios should be avoided.

For best performance, the join conditions should be set on ID fields. Please note though that ID fields may be stored in different formats (different lengths) in various data sources, and in those cases, the fields will not match and a join is not possible. Wherever the join on an ID field does not work, joining on UUID fields is advisable.
The join conditions should be set on compatible fields like having same data type and length. Don't ignore the compatibility warnings as this can cause errors in the future.
The join conditions should be on fields whose values are unique. For example, joining on the item ID (values like 10, 20, 30 etc) is not recommended since there are many documents with such item values. The result of such a join is not consistent.

Best Practices for Modeling / Handling Reports:

Avoid the Show Master Data setting (step 4 in the report wizard) unless absolutely necessary. If this setting is required, it should be limited to 1 characteristic.
If possible, mandatory selection variables should be set up in the steps 4 and 5 in the report wizard.
When setting variables in the report, this should be done via the report wizard, not via the Add Fields button in the report. End users can use the Add Fields button if they need to add one or two variables, whereas if you designing a report as a key user please set the variables in step 4 of the report wizard. Otherwise, the system first fetches the data for all characteristics and key figures that are part of the report, and then another call is made to get the data for the fields added via Add Fields. This will impact the report performance negatively.
When choosing fields for variables, try to use those fields that come from the anchor data source. Setting filters on fields from the anchor data source is always preferable.
Exclude selections take up more system resources than Include. Try to avoid the Exclude operator if possible, however if they are necessary, set them on fields of type ID, rather than UUID. So for example, if you need a negative selection (IS NOT) on the Employee field, make use of the field Employee Responsible ID, rather than Employee Responsible.
When running the report, make sure to apply proper filters in the report Selection - especially for reports / data sources with large data amount (e.g. more than 250k records).
Large selection ranges, as well as any select-all based approach should be avoided, since the same may lead into performance issues / timeout. Since the data volume constantly grows in the databases with time, every report should be handled with proper selection parameters.

Other Factors

There are other factors that influence the performance of Analytical objects in systems, such as overall data volume in the system, system usage, background jobs and their frequency etc. For this reason, a comparison of the performance of data sources and reports between different systems is not feasible, even if the above settings are the exact same.

Keywords

report cannot be opened, time out, report loading time, dump , KBA , best practice , best practices , performance report , AP-RC-ANA , C4C, ByD, Travel: Analytics Tools & Frameworks , How To

Product

SAP Business ByDesign all versions ; SAP Cloud for Customer add-ins all versions ; SAP Cloud for Customer core applications all versions

2366796 - Best Practices for Modeling and Handling Data Sources and Reports - Performance