Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HANA CDS Views vs Calculation Views vs Table Functions

Tags:

view

hana

cds

In SAP HANA I am used to create Calculation Views.

Previously I learned that Calculation Views (which after compilation are column-views) are to be prefered over Database-SQL-Views. Now with CDS-Views I am not sure if this is still the case. Especially with regards to performance.

Also what is now the difference between a table function (which replaced scripted calculation views) and CDS Views?

like image 428
Thorsten Niehues Avatar asked Jan 10 '20 12:01

Thorsten Niehues


People also ask

Will CDs views replace Hana calculation views in SAP?

I have been noticing in several SAP blogs that, CDS views will replace HANA calculation views. In this context, let us say, I have 2 flavors of CDS, native HANA CDS and ABAP CDS.

How do I add a table to a Hana calculation view?

It appears as if we were adding a simple table to a graphical view but at the back end, we have a SAP HANA table function providing the entire logic for this output. Select the fields both the fields to send to output in the same way we did in the graphical view tutorial . Once done, activate the calculation view.

How do I create a calculation view in e-Hana?

Create Calculation view in E-HANA. Test view output in HANA using native SQL. Create CDS table function. Create AMDP function for calling HANA native query on calculation view. Expose result to outer CDS with business logic. Select system and expand content folder and choose package where calculation view needs to be created.

What is the difference between CDs view and methods?

Methods allow us to perform some SQL calculations that we can’t necessarily do in a CDS view, as well as string calculations within a join and where clauses. Another advantage is that a table function allows for internal tables, which is something a CDS view cannot do.


2 Answers

Ok, this is a question that I believe requires some background to be answered.

A long, long time ago...

When SAP HANA was first developed, it heavily reused concepts and technology from other, already existing SAP products (TREX, P*TIME, MaxDB, Business Warehouse Accelerator).
One of the fundamental elements of the high query performance was (and is) the column store data-storage, which came in large parts from the TREX/BWA products. These products, in turn, had been solutions to very specific problems (full-text search for catalogs and speed-up of analytical queries from the SAP Business Warehouse data warehouse product).
Especially the BWA use case reflects in the column views of SAP HANA. Due to the limited use case of supporting SAP BW queries, no general SQL/relational query support was required (e.g. no arbitrary join-chain optimizations, no SQL features beyond SQL:92 etc.) whereas other, rather exotic features (like "vertical join") that could be used by SAP BW, were built into a query tool/engine ("engine" clearly was a very popular term with the SAP developers).

Once HANA proved successful as a platform to run SAP BW on, the next step was to add flexibility and make more general platforms like SAP Netweaver (the software that SAP's business solution products run on/with) working on SAP HANA. Now, SQL features were added and those required additional capabilities from the query optimizer and execution "engines". Query optimization had to be flexible and fast and should lead to query performance that would still beat the existing RDBMS vendors' offering (which had been around for 40+ years). This, clearly, is a hard problem and throwing is operational aspects of DB development (scaling, solution deployment, data federation, etc.).

This led to an overlapping development of different tools addressing different aspects of DB development.
SQL support and the underlying SQL optimizer were made more powerful, so much so, that (some) SQL queries could be as fast or faster than those modeled in calculation views. And since both of these "query frontends" eventually had to talk to the same internal data structures (row/column store) it was desirable to have just a single query optimizer, that would support all the different use cases.
Somewhere around HANA 1 SPS11/12 most calculation views started to be "unrolled" internally to feed into the common optimizer (that was what the "Execute in SQL Engine" flag was about).

I'd say, since then, the performance argument for using calculation views only holds in very specific circumstances.

I mentioned the overlapping developments and CDS (core data services) is one of them. The idea here is a very different one from SQL. While SQL gives you "the way to talk to the database", CDS wants to give your application a single data definition, that is used by the UI, the program logic and the data storage/query execution.

SQL != CDS

This probably needs some context (again): a major usage pattern of how SQL databases are used by application developers is that the application is written in some form of OO-implementation and the talking to the DB is left to a mapping layer/library (e.g. O/R-mappers). This means, that the knowledge of what the application is about (aka business process knowledge), is spread out in the application.

There is some information about it in the UI (labels, formatting, visibility, ...), some of it is in the application-object model (object dependencies, hierarchies, value domains...) and then some of it is in the queries against the database.

Such scattered knowledge/definition makes it hard to make changes consistent, which in turn, slows the development process and in turn prolongs the time until the application can run and deliver some positive outcome.
"Time-to-value" is the thing under optimization here as this is important for companies that give the promise of "success through innovation".

Ok, so this CDS thing is now part of the development models proposed by SAP and nearly en-passant also addresses topics like schema evolution and deployment of the data model. It is, in fact, independent of the actual database platform as shown in the CDS for ABAP variety.

How does this lead back to query performance? It does not really.

CDS' advantage is that one can provide more information about the data model than what is possible in HANA SQL.
Associations and joins with cardinality declaration (albeit now retrofitted to plain SQL) can enable the optimizer to use additional optimizations. Yet, the same optimizer and the same query execution "engines" are used here.

So, from a (query execution) performance point of view, it does not make a big difference, as long as no query semantics are required for which CDS does not have syntax (e.g. some window functions).

The main point of CDS really is about application development process performance and whether that works well with how you do development really depends on how much of it you can use.

Now for the question "scripted calc view" vs. "table function" vs. "CDS view".

Looking at these different object types from the point of "what can I do with them functionally?" will result in the observation "basically, the same". The difference lies in how these can be optimized (scripted calc views cannot be generally unrolled into the global query to be optimized), and what one can do with the object once created.
Table functions allow for very easy reuse across multiple views and queries. They also provide the option to provide parameters into the function (similar to parameterized views) and in addition allow for imperative coding. Functionally speaking, table functions are a kind of swiss-army knife; one can do nearly anything with them and they still can be part of global query optimization.

CDS views, as mentioned above, are nothing "special" in terms of query runtime or optimization. The main reason why CDS views are "a thing" is that with HANA SAP started to develop development models (such as XS, XSA, CAM) that revolve around "virtual data models".

The idea for those is that the structure of tables very often is stable and changes only little over time.
In a way, this is the "write-schema" of applications that enter the data into tables.
The "read-schema" is most of the time different from that. Queries re-combine the normalized data into records that the application can map into objects. This allows applications to look at the data differently than the original application. With "virtual data models" these queries are baked into tangible development artifacts (the views) that can be reused across the application. In fact, these can be treated as if this was the database with its tables, presented in a way that makes sense for the application.

Once again, if that is something that is beneficial for your application development depends on how your application development looks like.

Can you use HANA without CDS? Absolutely, and there are many areas where CDS lacks (i.e. the limited syntax and feature mapping to HANA features) but it does have its merits.

Should you abandon calculation views?

I would not necessarily change existing developments if they still serve their purpose, but calculation views certainly are an odd development object. Training folks in using those and SQL most likely is overly expensive compared to just sticking to SQL.

Personally, I prefer the code-based SQL development (better tooling available, allows for easier comparison with other DBMS, doesn't require WEB IDE/HANA Studio).
The only thing, SQL based development does not provide is the extended annotations/semantic information used by the SAP analytic frontend tools (SAC & BO) - these really are specific to CDS and Information Models (calculation views) but barely used by other analytic tools.

And that's my take on it.

like image 61
Lars Br. Avatar answered Oct 23 '22 06:10

Lars Br.


I would add that

  • Calculation Views are semantically richer. A SQL View does not know about measures, dimensions, hierarchies. https://blogs.sap.com/2019/08/26/what-is-the-difference-calcview-versus-sql-view/
  • The difference from the execution plan point of view is getting less and less. In Hana 2.0 SP4 most graphical calc views are turned internally into a single SQL statement to be executed by the SQL engine. So in that sense, using a CalcView gives you the additional information about the model plus the query performance of the SQL engine.

Lars' explanation of CDS is perfect. Nothing to add there.

like image 2
Werner Daehn Avatar answered Oct 23 '22 06:10

Werner Daehn