Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How appropriate it is to use SAML_login with AEM with more than 1m users?

I am investigating a slow login time and some profile synchronisation problems of a large enterprise AEM project. The system has around 1.5m users. And the website is served by 10 publishers.

The way this project is built, is that they have enabled the SAML_login for all these end-users and there is a third party IDP which I assume SAML_login talks to. I'm no expert on this SSO - SAML_login processes, so I'm trying to understand if this is the correct way to go at the first step.

Because of this setup and the number of users, SAML_login call takes 15 seconds on avarage. This is getting unacceptable day by day as the user count rises. And even more importantly, the synchronization between the 10 publishers are failing occasionally, hence some of the users sometimes can't use the system as they are expected to.

Because the users are stored in the JCR for SAML_login, you cannot even go and check the home/users folder from crx browser. It times out as it is impossible to show 1.5m rows at once. And my educated guess is, that's why the SAML_login call is taking so long.

I've come accross with articles that tells how to setup SAML_login on AEM, and this makes it sound legal for what it is used in this case. But in my opinion this is the worst setup ever as JCR is not a well designed quick access data store for this kind of usage scenarios.

My understanding so far is that this approach might work well but with only limited number of users, but with this many of users, it is an inapplicable solution approach. So my first question would be: Am I right? :)

If I'm not right, there is certainly a bottleneck somewhere which I'm not aware of yet, what can be that bottleneck to improve upon?

like image 707
Tolga Evcimen Avatar asked Mar 23 '19 13:03

Tolga Evcimen


People also ask

How does SAML assertion work?

SAML works by exchanging user information, such as logins, authentication state, identifiers, and other relevant attributes between the identity and service provider. As a result, it simplifies and secures the authentication process as the user only needs to log in once with a single set of authentication credentials.

Is SAML XML?

SAML transactions use Extensible Markup Language (XML) for standardized communications between the identity provider and service providers. SAML is the link between the authentication of a user's identity and the authorization to use a service.


1 Answers

The AEM SAML Authentication handler has some performance limitations with a default configuration. When your browser does an HTTP POST request to AEM under /saml_login it includes a base 64 encoded "SAMLResponse" request parameter. AEM directly processes that response and does not contact any external systems.

Even though the SAML response is processed on AEM itself, the bottle-necks of the /saml_login call are the following:

  1. Initial login where AEM creates the user node for the first time - you can look at creating the nodes ahead of time. You could write a script to create the SAML user nodes (under /home/users) in AEM ahead of time.
  2. During each login when the session is first created - a token node is created under the user node under /home/users/.../{usernode}/.tokens - this can be avoided by enabling the encapsulated token feature.
  3. Finally, the last bottle-neck occurs when it saves the SAMLResponse XML under the user node (for later use required for SAML-based logout). This can be avoided by not implementing SAML-based logout. The latest com.adobe.granite.auth.saml bundle supports turning off the saving of the SAML response. Service packs AEM 6.4.8 and AEM 6.5.4 include this feature. To enable this feature, set the OSGI configuration properties storeSAMLResponse=false and handleLogout=false and it would not store the SAML response.
like image 168
Andrew Khoury Avatar answered Oct 19 '22 02:10

Andrew Khoury