Saturday, May 24, 2014

Preload Mondrian Cache on Saiku analytics startup

Another strategy in Saiku/Mondrian to solve performance problems is to rely on its its cache. Mondrian library presents a cache layer in order to respond immediately if previous requests have built same agregations than asked; this cache is smarter than a simple SQL or MDX queries cache, it can detects whether same data is in its cache even if query to handle is not exactly the same as other previous requests.

Here we will improve a little bit this behaviour to avoid a common pitfall of cache usage; firsts users are always penalized because cache is empty. Because BI tools are accessed by a limited public, first users are often the only users, an already filled cache ready to use every morning can pleasure your mates as well as chocolate croissants.

Bad performance observation

As seen before there, preload cache has to be used only if it is necessary, take a look on this post to generate a situation where performances need to be improved.

You can also look in this previous post to follow a Saiku installation tutorial. I assume for the following that approximatively same installation.

Save slow queries 

Saiku Analytics provides a way to save queries. Once you find queries that are long a should need to be preloaded, save them by clicking on 'floppy disk' icon in main toolbar. This create a .saiku file in apache-tomcat-8.0.5/webapps/saiku/WEB-INF/classes/saiku-repository

Install and configure library to preload cache

No we will make saiku load this saved request on startup.

Download library saikuCachePreloader-1.0.jar , and place this jar into apache-tomcat-8.0.5/webapps/saiku/WEB-INF/lib.

Source code is available on https://github.com/GermainSIGETY/sauceDallas-saikuCachePreloader

To activate preload on startup, open apache-tomcat-8.0.5/webapps/saiku/WEB-INF/saiku-bean.xml and add theses lines (spring bean definitions) :
<!-- create another olapQueryBean that is not a 'session scoped' bean
(change the default behavior of initial olapQueryBean is a bad idea)-->
<bean id="olapQueryBean2" class="org.saiku.service.olap.OlapQueryService">
 <property name="olapDiscoverService" ref="olapDiscoverServiceBean" />
</bean>

<!-- Yet another clone of repositoryBean so that we can specify a different
path for saved queries to load at startup   -->
<bean id="repositoryBean2" class="org.saiku.web.rest.resources.BasicRepositoryResource">
 <property name="olapQueryService" ref="olapQueryBean2"/>
 <property name="path" value="res:saiku-repository" />
 <!-- <property name="path" value="/tmp"/> -->
</bean>

<bean id="mondrianCachePreloader" class="fr.saucedallas.MondrianCachePreloader">
 <property name="olapQueryService" ref="olapQueryBean2" />
 <property name="basicRepositoryResource" ref="repositoryBean2" />
</bean>
Important property here is 'path' for bean repositoryBean2: This indicates where to lookup requests to load on startup. Here it points in same repository as requests saved via Saiku user interface, which is WEB-INF/classes/saiku-repository. But you can specify another path, e.g  an absolute path on your os, and put saiku files on it.
Or a just a sub folder of WEB-INF/classes/saiku-repository.

See speedup

Restart Saiku, and perform a queries whose data should be loaded. Response time would be better.

Monitor preload

Add this appender in WEB-INF/classes/log4j.xml :
   <category name="fr.saucedallas">
     <priority value="DEBUG"/>
     <appender-ref ref="SAIKUCONSOLE"/>
   </category>
Requests preload will be logged in tomcat-dir/logs/saiku.log

Integrate with your data store update process.

Now you can organise a flow like this: your data store is updated every night, and just after this update, a restart of Saiku is scheduled, and then everything is ready to use for the day.

Friday, May 23, 2014

Build an use aggregate tables with Mondrian and Saiku Analytics

Using multi-dimensional cubes to build crosstabs with tools such Saiku Analytics, with an important amount of data leads to a recurrent problem : performance. In cases, datawarehouse contains millions on lines on fact tables, so response times of Saiku could become inacceptable ; database spend substential time to aggregate many lines.

Mondrian provides several ways to improve responses times of crosstab requests. First mean presented here is creation and utilization of aggregate tables. Idea behind concept of aggregate tables is to have kinds of other fact tables, with less granularity than initial fact table. For instance my fact table stores sales per clients. If users display frequently crosstab with sales per countries (countries are in clients dimension), and requests for theses crosstabs are too long, we can create a fact table the stores just sales per country. This new fact table does not contains sales details per user, but sales measures already calculated/aggregated per country.

There are bunch of articles all over the net that explain this concept, first is Mondrian Pentaho documentation: http://mondrian.pentaho.com/documentation/aggregate_tables.php

But its set-up is far less documented. Let's go to carry out together aggregate tables that will be used by Saiku.

Bad performance observation

As seen on this previous article, we keep using the same cube 'sales'. Now download this same cube's data schema but with a fact table that contains 27 millions lines (dump available here MySQL_sales.sql.zip).

Open Saiku, build few crosstabs and look at execution time. on my Intel i5 2.5Ghz quad core I have :

  • Sales amount and quantity per continent : 71 seconds
  • Sales amount and quantity per continent and product type : 283 seconds



We really face to performance problems: these response times are too long and I know that my big-boss-wearing-a-3000-bucks-suit monitors tirelessly sales per continent. It's now time to build an aggregate table in order to display measures 'per continent' faster.

Define aggregate tables

Pentaho provide a great tool for that, but IMO badly referenced : Pentaho Aggregation Designer. This tool analyzes your datamart ; check your DB's schema, suggests some aggregate tables, and, icing on the cake, writes for your SQL create/insert queries to build theses aggregate tables.

Download

First download latest stable version of this tool (here 5.0.1) at: http://sourceforge.net/projects/mondrian/files/aggregation%20designer/

Install

Unzip it and first be sure that JAVA_HOME and PATH environment variable are correctly set (as seen in this previous article).

Copy MySQL JDBC driver (or the corresponding driver of your database), as downloaded previously into
pentaho-aggdesigner-ui/drivers directory

Launch and configure 

Launch the app with startaggregationdesigner.[sh|bat] executable file, and click on 'configure' button in welcome popin : you have to specify database connection of your datamart.
-choose connection type 'MySQL'
-fill hostname (localhost)
-fill database name
-fill user name and password
-choose access type as JDBC
check with 'test' button and close with 'ok'.

Still in popin, in OLAP Model part, for 'Mondrian schema file', open Mondrian schema previously used
Then select 'sales' cube in cubes list, and finally click on 'connect'.

Aggregation design perfoms so analysis of database and check that tables indexes are correctly set.
Then you access to main view.

Define aggregate table

Now click to 'add' button on the right, dimensions lists appears in 'aggregation level panel'.
We need to have correct response time for 'sales per continent and per product', so choose
'continent' level for customer dimension
'product type' for product dimension
'all' for time dimension

And click on apply. Then Aggregation designer informs you that this potential aggregate table would contains 28 lines.
Of course it will be faster to aggregate 28 lines only but It's a bit overkill. You know that time dimension is often used too, in order compare sales variations per month. change level for time dimension and choose 'month'. Click on 'apply' again. Target aggregate table would have 14784 lines, that's a good tradeoff.

Create aggregate table in database and update Mondrian schema

They are two actions to settle aggregate tables
-update mondrian schema xml file to specify usage of these new fact tables.
-create new fact table in database

Database actions 

Click on 'export' button in top toolbar.
On 'Create aggregates' section click on 'execute' to create new table into database.
On 'Populate aggregates section' click on execute to populate theses new tables.
And finally export new Mondrian schema into proper Saiku directory : override previous schema or create a new. If create a new one, don't forget to specify in Saiku configuration file (sales.txt), name of this new file.

Activate usage of aggregate tables usage on Saiku 

Open apache-tomcat-8.0.5/webapps/saiku/WEB-INF/classes/mondrian.properties and uncomment and modify line 609 :
mondrian.rolap.aggregates.Read=true

modify line 851 to have
mondrian.rolap.aggregates.Use=true

Use and monitor queries on aggregate tables

Restart Saiku. To be sure that previous queries are not returned from database cache, clear DB's cache (command 'reset query cache' on MySQL) and launch same query. Now you have substantial gains, 'sales per continent and product type' are displayed less than 1 seconds, vs 280 seconds before.

Check usage of aggregate tables on logs

To be sure that nothing is cheated, you can monitor SQL on logs, and be sure that new fact tables are used.
For that we have to activate SQL logs.

Open apache-tomcat-8.0.5/webapps/saiku/WEB-INF/classes/log4j.xml and uncomment line 129 to 142 in order to have
   <appender name="SQLLOG" class="org.apache.log4j.RollingFileAppender">
     <param name="File" value="${catalina.base}/logs/mondrian_sql.log"/>
     <param name="Append" value="false"/>
     <param name="MaxFileSize" value="500KB"/>
     <param name="MaxBackupIndex" value="1"/>
     <layout class="org.apache.log4j.PatternLayout">
       <param name="ConversionPattern" value="%d %-5p [%c] %m%n"/>
     </layout>
   </appender>


   <category name="mondrian.sql">
      <priority value="DEBUG"/>
      <appender-ref ref="SQLLOG"/>
   </category>
Restart Saiku, query again and you see in apache-tomcat-8.0.5/logs/mondrian_sql.log SQL queries executed. You can see that requests 'from' clauses points to aggregate table instead of fact_sales table.

Drawbacks

Aggregate tables usage is strongly facilitated with the designer, but their maintenance is more complicated.

First you have to integrate update of these new tables in your data store update process. Your ETL for instance have to be updated too, in order to maintain consistent data between intial fact table and aggregated tables.

You will face quickly to frequent users requests to accelerate some queries, so create new aggregates. And you to have maintain all these stuff. Because Mondrian need to match precisely levels displayed vs aggregated levels presents in aggregate table, it is possible to build many aggregations : combination of all levels of all dimensions, 60 possiblities in our shabby cube example.

All that work can become cumbersome, so we see in next article how to use Mondrian cache only, and populate this cache with users queries results at Saiku startup.

Saturday, May 10, 2014

Saiku Analytics : UI customization and deploy UI and webapp on different servers


Saiku Analytics, from Meteorite BI is built in a manner that is easy for developers to customize user interface. Its architecture has been designed for that : there is a backend J2EE webapp for Mondrian/OLAP/MDX and database queries stuff, and an UI webapp that uses the first one via REST/AJAX/JSON http queries.

If you still not have a working Saiku Server on your box, loose your necktie and follow the tutorial on this previous post.

I Assume for the following that you have approximatively the same installation than this described in my installation post : UI deployed as ROOT webapp and backend webapp as 'saiku' on Tomcat.

Saiku UI is based on backbone.js, underscore.js, and JQuery-template, but whitout diving into these frameworks we will see what we can do by few examples.

The html single page

Open file apache-tomcat-8.0.5/webapps/ROOT/index.html, and starting to line 144, there are scripts tag of type text/x-jquery-tmpl. Each of them is responsible for displaying one area of the UI.

These scripts block are jQuery-template code blocks, we will change them without thinking hard about
jQuery-template framework, finally this is just HTMl code with placeholders and conditionnal directives.

Top toolbar - modify and brand with your company logo.

the first JQuery-template script on line 141 displays the toolbar on top :

<script type="text/x-jquery-tmpl" id="template-toolbar">
    <ul>
            <li><a id='new_query' href='#new_query' title='New query' class='new_tab i18n sprite'></a></li>
            <li class='separator'>&nbsp;</li>
            <li><a id='open_query' href='#open_query' title='Open query' class='open_query i18n sprite'></a></li>
            <li class='separator'>&nbsp;</li>
            <li><a id='logout' href='#logout' title='Logout' class='logout i18n sprite'></a></li>
            <li><a id='about' href='#about' title='About' class='about i18n sprite'></a></li>
            <li class='separator'>&nbsp;</li>
            <li><a id='issue_tracker' href='#issue_tracker' title='Issue Tracker' class='bug i18n sprite'></a></li>
    </ul>
        <h1 id='logo'><a href='http://www.analytical-labs.com/' title='Saiku - Next Generation Open Source Analytics' class='sprite'>Saiku</a></h1>
</script>

Just comment/remove li tags to hide some buttons, and for changing the logo remove id=logo on h1 tag and class='sprite' on <a> tag, and replace it by your favorite company logo with an img tag.

Querying toolbar

In the same way query toolbar corresponds to jQuery-template script tag on line 298 with id id="template-workspace-toolbar".
Ok you have understood...

Internationalization - labels

All text labels are written in JSON files on apache-tomcat-8.0.5/webapps/ROOT/js/saiku/plugins/I18n/po. Find the one that corresponds to your language and fee free to modify with your jargon.
Language displayed in UI corresponds to one of the preferred languages set on your web browser.

Google analytics tracker modification

Still index.html, the block on line 626

<script type="text/javascript" src="js/ga.js"></script>
Calls ga.js that activate google analytics tracker of Meteorite BI. Change in ga.js account variable value with your GA account ID
Otherwise remove call to this script by commeting this line on index.html

Manage Authentication pop-in display, change default logins/passwords

Ok this subject merits a further consideration than a brief paragraph, but quickly two tips, before a forthcoming post about authentication/permissions in Saiku

Change default password

By default user are declared in a text file on Saiku backend webapp (not ui webapp)
To change default password, add user etc, open
apache-tomcat-8.0.5/webapps/saiku/WEB-INF/users.properties, it contains

#Username,password,role
bob=dylan,ROLE_USER
krishna=krish2341,ROLE_USER
smith=pravah@001,ROLE_USER
admin=admin,ROLE_USER,ROLE_ADMIN

So syntax is :
[login]=[password],[roles....]

Add lines, change password etc.

Remove authentication pop-in.

For demonstration, POC, tests, you should need to remove authentication popin, and be always and directly logged as admin.

Open apache-tomcat-8.0.5/webapps/ROOT/js/saiku/Settings.js
And you will see one lines 139 to 142

if (window.location.hostname && (window.location.hostname == "dev.analytical-labs.com" || window.location.hostname == "demo.analytical-labs.com" )) {
    Settings.USERNAME = "admin";
    Settings.PASSWORD = "admin";
}

Comment the "if" clause and its related closing bracket, reload the page, no authentication pop-in anymore.

Embed html page in an existing webapp, and deploy Saiku backend to another server - Run UI and webapp on different servers

A common use case, to take advantage of the two webapps architecture of Saiku, is to add the html page index.html and its related js/css stuff to an existing application/server (an enterprise portal, a php application etc), perform your personal make-up on it, and deploy Saiku backend saiku-webapp-2.5.war to another webapp. Let's see how to do this this stuff.

UI part

In apache-tomcat-8.0.5/webapps/ROOT/js/saiku/Settings.js
/**
 * Change settings here
 */
var Settings = {
    VERSION: "Saiku 2.5",
    BIPLUGIN: false,
    BASE_URL: "",
    TOMCAT_WEBAPP: "/saiku",
    REST_MOUNT_POINT: "/rest/saiku/",
    DIMENSION_PREFETCH: true,
    ERROR_LOGGING: false,
You see on line 23 :

BASE_URL:""

Complete this line to make point the whole UI to another URL, for instance http://localhost:8081
(For a quick test, just copy/paste  whole apache tomcat directory, change in apache-tomcat-8.0.5-copied/conf/server.xml, server shutdown port, line 22, to 8006, http connector port, line 68, to 8081 and ajp connector port, line 90, to 8010).

Ok It's good but your first test on a browser fails, and your browser debug console says that 'Cross origin requests are only supported for HTTP'.

It's about same origin policy

We will not dive into this subject here, and only avoid this snap.
A mandatory trick is to enable ajax calls from the UI to ship authentication cookie sent by webapp during authentication stage :

Open apache-tomcat-8.0.5/webapps/ROOT/js/saiku/adapters/SaikuServer.js

and add line 145 to ajax parameters this parameter :
xhrFields: {withCredentials: true}

In order to have
    params = {
      url:          url,
      type:         type,
      cache:        false,
      data:         data,
      contentType:  contentType,
      dataType:     dataType,
      success:      success,
      statusCode:   statuscode,
      error:        failure,
      async:        async,
      xhrFields: {
                withCredentials: true
          }

Saiku webapp part

Apache Tomcat API provides a CORS Filter that removes this limitation (CORS for Cross-Origin Resource Sharing, see http://tomcat.apache.org/tomcat-7.0-doc/config/filter.html#CORS_Filter for this filter, and http://manning.com/hossain/CORSinActionCH01.pdf, first chapter is free to read,see what is behind CORS).
So open (on the tomcat that hosts the webapp, not the UI) apache-tomcat-8.0.5-copied/webapps/saiku/WEB-INF/web.xml, and paste on line 15 following snippet :
  <filter>
     <filter-name>CorsFilter</filter-name>
     <filter-class>org.apache.catalina.filters.CorsFilter</filter-class>
  </filter>
  <filter-mapping>
       <filter-name>CorsFilter</filter-name>
       <url-pattern>/*</url-pattern>
  </filter-mapping>

Now this server can respond to ajax queries from another host (to sum-up).

Tuesday, May 6, 2014

Saiku Analytics installation-deployment on a Tomcat server

Saiku Analytics is an tremendous stand-alone tool for building and querying datamarts, build ad-hoc reports, graph, export reports to excel csv etc. Saiku is based on Mondrian OLAP engine, a ‘state of the art’ Java library for modelizing and requesting Multi dimensionnal cubes.

I wrote this tutorial for BI dudes that are not very familiar with J2EE development and Apache Tomcat, hope this permits you to accelerate utilization of Saiku.

Before starting with Saiku installation, this tutorial assumes you already have realized your datamart and modelized your Mondrian schema (please find at the end tutorials links for building Mondrian schemas).

Otherwise you can download a sample datamart and its related Mondrian schema there

Step 1 : downloads

Apache tomcat 7

Download apache-tomcat-8.0.5.zip on http://tomcat.apache.org/download-80.cgi

Java JRE 7

http://www.oracle.com/technetwork/java/javase/downloads/jre7-downloads-1880261.html, choose a version that corresponds to your system.

Saiku Analytics webapps

For Saiku Analytics you need to download two war files on http://meteorite.bi/saiku/download

Notice that it’s not compulsory to fill the contact form ; click on ‘Download Saiku’ below the form to display download links

Download Saiku Backend WAR 2.5 : saiku-webapp-2.5.war
Download Saiku UI 2.5 : saiku-ui-2.5.war

Step 2 : JRE and Tomcat installation

JRE

Uncompress JRE tarball (or zip) and set environment variables JAVA_HOME and PATH properly. There are a bunch of web pages for this subject (see http://stackoverflow.com/questions/20822101/setting-the-java-home-and-path-on-macos, http://stackoverflow.com/questions/2619584/how-to-set-java-home-on-windows-7, http://stackoverflow.com/questions/9643964/java-home-incorrectly-set-how-to-reset-it)

Now if you type in a console ‘java -version’, you should have an output like

java version “1.7.0_55″
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode

Tomcat

Unzip apache-tomcat-8.0.5.zip file.
Go to apache-tomcat-8.0.5/webapps directory and remove folders docs, examples, host-manager, manager and ROOT. Those folders are examples or adminstration webapps that are not used by Saiku nor covered by the tutorial.

You are now able to launch tomcat server (ok it hosts nothing), by launching apache-tomcat-8.0.5/bin/startup.sh or startup.bat (for unix users, a little ‘chmod u+x apache-tomcat-8.0.5/bin/*.sh’ would be wise), and you should have in apache-tomcat-8.0.5/log/catalina.out no errors, and a line like :

05-May-2014 17:00:24.284 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in 230 ms

Step 3 :  Saiku webapps deployment

Stop tomcat server (if launched)

-rename/copy saiku-ui-2.5.war to ROOT.war
-rename/copy saiku-webapp-2.5.war to saiku.war

Copy those two renamed files to apache-tomcat-8.0.5/webapp. Now you have in this folder only the two files like this :

apache-tomcat-8.0.5/webapp/ROOT.war
apache-tomcat-8.0.5/webapp/saiku.war

Launch apache tomcat, look in catalina.out, you should see these two lines

05-May-2014 17:14:30.045 INFO [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deployWAR Deploying web application archive /home/gesig/saiku/1_tutoInstallSaiku/apache-tomcat-8.0.5/webapps/ROOT.war
05-May-2014 17:14:31.844 INFO [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deployWAR Deploying web application archive /home/gesig/saiku/1_tutoInstallSaiku/apache-tomcat-8.0.5/webapps/saiku.war

And now open this url : http://localhost:8080, and you will see the login form of Saiku.
default login is admin, and password is admin

Good job dude, You can jerk on your seat.

Step 4 : configuring Saiku for using your cube schema and your Database

Wicked, you’ve got your Saiku webapp but you have no choice in cubes dropdown on  top left. No we will push our specific BI stuff.

Stop Tomcat server

first of all you have to find and download your database’s specific driver for Java, commonly call a JDBC driver or JDBC connector.

In my example, we need a MySQL jdbc connector, so download on http://dev.mysql.com/downloads/connector/j/ the ‘Platform independent’ one (you can skip account creation or registration by clicking on ‘No thanks, just start my download’)

Uncompress the downloaded file mysql-connector-java-5.1.30.zip, and you should find in it a file mysql-connector-java-5.1.30-bin.jar. Copy that file in ./apache-tomcat-8.0.5/lib

If your use another database, find the corresponding jdbc driver, which is a jar file (ie a java library), and copy it into ./apache-tomcat-8.0.5/lib

-For a Postgres server, you should find postgresql-9.3-1101.jdbc41.jar on http://jdbc.postgresql.org/download.html
-For a MS SQL backend , I advise you to use JTDS driver ; on http://sourceforge.net/projects/jtds/files/ you will find jtds-1.3.1-dist.zip that contains jtds-1.3.1.jar.

Now we have to declare to Saiku/mondrian our cube :
copy a Mondrian schema file (sales_mondrian_schema.xml) into apache-tomcat-8.0.5/webapps/saiku/WEB-INF/classes/saiku-datasources/

in apache-tomcat-8.0.5/webapps/saiku/WEB-INF/classes/saiku-datasources/ create a file named sales.txt ( file extension has no importance, choose everything you want), and write into it lines below :

# declaration of Sauce Dallas sales cube for Sakiu
# ———————————————
type=OLAP
name=Sauce Dallas
driver=mondrian.olap4j.MondrianOlap4jDriver
location=jdbc:mondrian:Jdbc=jdbc:mysql://localhost/1_tutorialSaiku;Catalog=res:saiku-datasources/sales_mondrian_schema.xml;JdbcDrivers=com.mysql.jdbc.Driver;
username=root
password=7777777

See explanation just after about datasource definition
Now start apache tomcat again, and you should have in saiku page an available cube in the dropdown list.
You can drag’n’drop measures, fields, and build crosstables and graphs as you need. All the stuff for analyze your activity and so overshoot our company crisis.

Explanation about datasource definition

the most tedious part in Saiku configuration is writing the datasource definitions for cubes. Let’s see in details sales.txt file :

type=OLAP specify that is an OLAP engine. I haven’t seen other value for this property
name : you specify a name for your datasource
driver: specify the Mondrian driver. I haven’t seen other value for this property
location : this property has several parts, separated by semicolons :

jdbc:mondrian:Jdbc=jdbc:mysql://localhost/1_tutorialSaiku :
specifies jdbc url of the database ; forget for now the part ‘jdbc:mondrian:Jdbc=jdbc:’ and just remember that mysql://localhost/1_tutorialSaiku is a JDBC convention for a mysql database. Here the host is localhost and database name is 1_tutorialSaiku

For a postgres database we can have jdbc:mondrian:Jdbc=jdbc:postgresql://localhost/1_tutorialSaiku

Catalog=res:saiku-datasources/sales_mondrian_schema.xml
indicates path of mondrian schema file. ‘res:’ (spring stuff) indicates that file is located in classpath, in our case, a path starting from classes directory of saiku webapp.

JdbcDrivers=com.mysql.jdbc.Driver
Indicate the java class file to use as driver for the database connection. You should see in mysql-connector-java-5.1.30-bin.jar that there is a classe file named Driver.class in folder com/mysql/jdbc
For Postgres write : JdbcDrivers=org.postgresql.Driver
For Oracle (thin driver) write : JdbcDrivers=oracle.jdbc.driver.OracleDriver
For MS SQL (jtds driver) write : net.sourceforge.jtds.jdbc.Driver

Tutorials for building Mondrian schemas 

Coming soon