SCOR’s NonLife Risk Modelling (NORMA) on Databricks Apache Spark

May 26, 2021 12:05 PM (PT)

The use case we want to present is the replatforming of SCOR’s Non-Life Risk Modelling Application (NORMA).

NORMA is a Monte Carlo based enterprise application and has been developed fully in-house by SCOR; it is part of its Internal Model for estimating the combined risks and the diversification of its P&C (Property & Causality) portfolios. Monte Carlo methods are widely used in the financial industry by insurances, reinsurance, banks, asset managers, etc. They work by simulating the various sources of uncertainty affecting the value of portfolios (investments and other general financial instruments) and then analyzing the distributions over the range of resulted outcomes. The original Norma application was a 10-year-old legacy application written in C# within the .Net framework. As the business demands have increased over time and therefore the computational efforts needed, its monolithic architecture had reached its limitations. The new application is now Python based on Azure and Databricks cloud services. This enables scaling out horizontally, massive parallelization and processing big data in an efficient manner. Parallel processing is becoming more and more a necessity for efficiency and speed in business applications. Here Databricks helps us as a unified analytics platform which provides on demand cloud-optimized clusters with Apache Spark, jobs scheduling with dynamic resource allocation and Notebooks for deep data analysis. Thanks to this High-Performance Computing Solution, our teams of experts are now able to simulate countless number of scenarios for each and every pieces of its P&C portfolio in less than half a working day. If one requests a risk analysis for a new P&C business opportunity at 9am, he/she will receive the report on his desk before lunch! In general, reporting capabilities have improved both in scope and runtime, with runtime improvements of up to an order of magnitude. This means fast, accurate and detailed estimation of all the risks and their diversifications. Which ultimately generates advantages for our clients and partners: less time and resources consumed, with deeper business insights and risk analysis.

In this session watch:
Sahand Razzaghi, Software Engineer, SCOR
Luca Valentino, Software Engineer, SCOR



Sahand Razzaghi: Welcome, everyone. Thanks for joining our session. My name is Sahand Razzaghi and I’m presenting today together with Luca Valentino SCOR’s Non-Life risk modeling application running on Databricks. Just very, very quickly, so who are we? We work for SCOR, which is a global reinsurance company. It has a few thousand employees and three dozen offices around the world. We are actually located in Switzerland. So in case you’re not familiar with the re-insurance concepts or re-insurance as a company providing cover or cover to other companies, mainly primary insurances. So a primary insurance is a company where you usually go and get as a normal person, your insurance for house and motor and car and stuff like that. So SCOR launched recently its latest strategic plan called quantum leap, which has this time a very strong technological focus and this project was realized under this strategic plan.
So what are we going to talk about today? We want to give you a brief introduction about Non-Life risk modeling. Then we go a bit into more details of the replatforming. So what did we do? What was the reason? We give you then finally, a bit of a technical approach. So technical insights about how it was done and finally, a small acknowledgement to all the a lot of people involved in that project. So the Non-Life risk modeling, here it’s more an illustrative overview of certain risk contributions.
So the term Non-Life is also often referred to as PNC or property casualties so that you just have heard the term here. This makes the term Non-Life a bit more concrete. So property means everything, which is physical goods, like a car or a factory. Also very big things like ships or whole construction sites and casualty here means casualties, which are liability related. So an example would be if you produce a certain medical device, which has a design flaw and kills people, these casualties are human. So this is a liability related casualties. So here on the upper left corner, we have the natural catastrophes or in short nat cat how we call them. This is more or less the textbook example of re-insurance. So you can think about a primary insurer. If you insure houses and property in Florida, you most certainly need a reinsurance cover against very big nat cat events like hurricanes and earthquakes.
Then if you move from the nat cat down to motor or car insurance, and in this example, this is more a traditional line of business I would say. You can think about it that it’s everywhere more or less the same, like the cars are the same, the rules are the same, the issues are the same and just risk universe is actually very, very dynamic. So it’s also expanding if you go to the right, to the cyber risk. This is actually a risk, which was introduced very, very recently, it’s actually a very new risk compared to motor and nat cat risks.
So here, if we zoom a little bit out of the Non-Life view and look at the broader risk landscape, which we model at SCOR, and it’s called simply internal model. We are at the reddish pinkish box here, NORMA, which is simply Non-Life risk model aggregation. If you move two boxes up, you have the life, which is the compliment of Non-Life. So this deals with everything, life insurance, health insurance, critical disability, disability critical illness, stuff like this and other risks which we have in the internal model. The typical things that you find in any financial internal model like credit risk, asset risk, op risk.
Now, I want to briefly talk about the feeding systems for the Non-Life risk model aggregation. The bottom, the SCOR business solution pricing and the PNC treaty pricing. This you’re going to think about two category of products. So here we get the information of what are the risks we’re covering for the clients in the future? The reserving is actually all the risks which we covered in the past from the clients, which still have uncertainty and therefore risk and the nat cat is also modeled outside. Actually, there is a whole industry providing tools for modeling nat cat events so they already model… Done the like physics. So it’s not strictly a financial model.
So if we now be a bit more specific and zoom into one of the Non-Life risk model processes, basically the aggregation, this is actually the step which gave the tool its name. So it’s a Non-Life risk model aggregation tool. And I want to briefly explain to you how it’s working. If we start at the very bottom, you see treaties, this is actually all the risks we covered for the clients so it’s single contract level. They are grouped in certain baskets and you can think about that, we have tons of these baskets and now in this basket are located in different branches of a hierarchical tree. So the branches represent the dependencies, which you would have between the baskets.
So as you move up here, as you move up, you apply dependencies between portfolio. So you can also think about baskets as small portfolio stuff. If you move up to aggregation, here are key, you apply dependencies between the portfolios, you gain diversification, and then you move up until the very top note, the Non-Life, there you are then finally an internal model and Non-Life risks get diversified here with other risks. So conceptually the baskets, we do Monte Carlo simulation so all the portfolios, not the baskets at the bottom are independently simulated. Then you move up the hierarchy, you apply dependencies to gain diversification and at the very top you have a fully diversified scenario consistent view.
Okay, now I want to talk about a replatforming project. So what did we do and what was the reason to start this? So we had an original application, which was doing the aggregation, which we discussed quickly. Also, there are tons of other things doing so there was a lot of evaluation, tons of reporting on top of it, which was, I would call it a typical business application that was written in C# and more than 10 years ago. It has grown to over 100,000 lines of code. And there was a lot of stuff added, which made it run slower and slower, so we increased the machine. So as we increase the model, we increase the machine until now we had a very, fairly big server, which was running here 24/7, of course. And here we got in this threshold that if a user would need an extensive model with some extensive reports, he had to wait a whole working day, basically had to come back the next day. Even if you had long working hours, you have to come back the next working day.
So during this run, all other users would also be blocked. So he would block the whole machine for more than one working day, which is very inconvenient then as more users want to run more stuff. So we decided to completely replatform this application. Also, we had the typical problems due to legacy code. So it was hard to maintain. It was easily breaking, our test coverage was low. So the requirements for the replatforming of this application was we wanted a fast and efficient backend calculation obviously, but also we wanted to have scalability. So our different users should not be blocking each other. So there should be dedicated computing resources for different users. And also for one user, the system should dynamically scale for a user running small things or running very big things.
Then we had to recreate all the standard reporting because the model had standard reporting for different stakeholders and it needed to be really replicated one-to-one. And on top you wanted to add end user computing capability. So when the original application was designed, nobody envisioned that 10 years later really business people want to interact dynamically with the data. That’s how we decided to use big data technology for re-platforming this project. So we have chosen Spark as the computational framework and Databricks as the service on top. Databricks offered a lot of things, which we would already easily need and we were looking for. So the Spark versioning was taken care of. We could easily deploy our application with a Docker container. The security was taken care of by active directory support and with the notebooks together with SQL analytics, it was the perfect end user computing thing, which we could give to the business users so they can interact with the data in a more dynamic way. So I want to hand over to my colleague, Luca Valentino who will tell you now what came out and well, how did we do it? Thanks a lot.

Luca Valentino: Hello. I’m going to take over from Sahand and going into a bit more details. So we’re first going to tell you what we achieved with a recap from the project. So we had an old application, which was based on the .NET, and it was running on a premise server, which was like a 32 person, previously we had 384 gigabytes of RAM. And then code was optimized and still to do some parallelization at low level, but still quite optimal I would say. We have a shared memory architecture. The data model was based on blobs. Let’s say for the big data, we were saving data in an Oracle database on blobs. And for the equal data and meta data, we were just using plain SQL tables. The application, there was almost no scalability because of the architecture and many different problems related to the architecture. So then because of this, the processing time was quite long and the RAM consumption was very high. It was 300 gigabytes for each model run. So the users were basically blocked. They could not run anything in parallel. Only one would run per user.
The data storage was very inefficient because a large amount of data was saved as I was mentioning to SQL blobs and the read and write time was taking too long. Therefore we had once one single model run taking several hours and users waiting, actually one user waiting for the model round to be finished, like after one day and the day after still they had to run additional reports. So it was like a long time process for running the model where then we were also limited to 100,000 simulations. And in addition to that, there was only one version of the application that meant it could be deployed to QA end production. And the deployment was also a bit painful. There’s no efficient way and nothing which has to do with the DevOps practices.
So code change, on the top of all of these problems we had, code change were really hard to make due to legacy code. The application was really rigid and fragile. There were not so many tests as Sahand was explaining, our test coverage was really low and it was really a pain to change the code. So we went from this paradigm to a new one, which is from the on-premise and the .NET and all this shared memory architecture to a Python PySpark based application, which is running on Azure Databricks now. We’d on demand elastic clusters. So we can now ship massive parallelization thanks to Spark. We can paralyze and achieve horizontal and vertical scaling. We can save everything in Delta Tables now, and that’s what we used as a storage model.
The Databricks cluster scaling is also dynamic, so we usually set up the cluster so that we’d go from 5, 10 or more to 20, 30 or 40 nodes. And Databricks decides how to scale-out now with a number of nodes. So we did a system, now we can implement more complex algorithms, and we did already implement very complex things, which were impossible to implement with our old system. So we have that very efficient I/O with Delta Tables. We store about 25 gigabytes per model run. The model run is way faster. It’s more than 10 times faster than the previous one. The additional reports, they are not comparable at all. They are about 10 times faster when operational all of them, but some of them they’re really extremely fast compared to the previous application.
So we had 1 million simulations, but we are not limited to 1 million. We could still leverage from the scaling out and go even to 10 million, if we want. So model runs now are possible. We can run them in parallel. There is a multi versioning system, which is allowing the users to test many different versions on QA and also production to run different versions at the same time. We can deploy thanks to Azure DevOps in a few minutes, all the components. We have no legacy code and the application is built in such a way that everything is so modular, which is very easy to extend and it’s very easy to test. So in the future, what I want to mention that we would like also to leverage from photon on the bottom, the framework and the GPUs, we already started investigating this. We are at the moment waiting for the [Nexus] release because there are some limitations at the moment. Yes, we are looking forward actually to jump also on this new technology and see and leverage from this.
So then the next slide I would like just to focus quickly on how we organized this project and how we could make it operative. So we had a small development team. We started with this small development team in 2018. We finished the project last year in about October, end of September, beginning of October officially. But since then we continue developing a lot of new features. So the project is in collaboration between IT and SCOR P&C. We adopted an [inaudible 00:16:56] and agile approach using Scrumban, which is a mix of Scrum and Kanban. On the top of this, we added a lot of extreme programming engineering practices, like testing and development, refactoring, pair programming, trying to keep the design simple, heavy coding standards and continuously integrate and having small releases.
So the whole framework is really like a software engineering framework, which we adopted for the development and on the software engineering side, we have a system which has made the object oriented financial programming of persons used mainly within the older classes objects, we adopt a clean code, and we use code as documentation. We had decided that since it’s an interim project, we’re not [inaudible] condition to focus on the code. We also did implement design patterns to facilitate the implementation, we try to be SOLID and we use architectural patterns and applications, domain centric, we use encapsulation and continuous design.
On the DevOps side, we use Azure DevOps to basically do all the development and operational, we have repos, pull requests, we have all the backlog management support and all the CICD pipelines. Well, something very interesting, which is a bit particular in our project the CI pipelines for the Spark test, unit test and [inaudible] test, they spin up Databricks clusters so that we can run the tests on the cluster and jump back and basically get the results directly from the cluster.
So next is a quick overview of the system architecture. I will go quickly through that. Basically now the user has the possibility to go through the user interface, set up its model goals through the equal data and do a data validation, storing his data, which are stored in a Oracle database. And once the model is ready, they can basically click a button, run the model. The system is communicating via an interface, which is a web application with the backend. And the backend is basically grabbing the package, which we deploy for Azure container registry. I mean, all the artifacts and the Docker image are basically pushed there. So we grab them and there is Azure data factory is a job orchestration for all the pipelines, which is then basically spinning up clusters for each pipeline.
Databricks does all the computations, the ETLs, the simulations, it’s storing everything in the Azure data lake. And then from the data lake, we can still pick up all the results and do additional reporting. So the additional reporting is done through Notebooks and all the reporting tools. We have a reporting tool. And so the user can go back also to Databricks and get into the Notebooks, prep the data, and do additional analysis, which is basically what Sahand was saying at the beginning, where there is an end user computing functionality, which is facilitating the reporting for the users.
So we jump on the next slide, which is the technical approach we adopted. So this is a very simplified version of let’s say what we do in the application. It’s extremely simplified, but it’s just to give you an idea of what we do in terms of steps. If you look at the top of the slides, you have the model steps. In the middle, there is like a graphical representation of them. And at the bottom, in the technical approach, I’m trying to describe what we did in terms of what we use in terms of technology and what approach you use for implementing this.
So the first model step is we have all the equal data, which are basically probability distributions, which are inputted by the users. And these distributions are used to do Monte Carlo simulations. This Monte Carlo simulation it’s a 1 million simulation for all the segments modeled by the users. Each segment has three probability distributions from premium losses and expenses. And what we adopted is actually the end approach based on NumPy and RDDs. Then by using [inaudible] random number generators and NumPy, we could actually form the parallelizations and the RDDs were helping us basically to transform, let’s say to go from the equal data, which were not Spark-based or data frames to the data frame.
So after this, then we have everything in Delta Tables. We can load the Delta tables and we do for instance, for the Copula based dependency model, which is nothing else than taking these distributions and creating dependencies between the distributions. And we started using only Spark data frames and SQL operations and Pandas UDF, and as well, just pure UDF and Windows functions. Everything gets saved again into the Delta Tables and the evaluation process, as well as for the reporting, we only use data frames, Pandas UDF, UDF, Windows functions, and Delta Tables maybe. This is more or less the picture, of course, there are tons of other things which are… This is extremely simplified, but this is just to say that we start from an RDD approach just because our data doesn’t allow to jump directly into Delta Tables. And then from that first step, we move to more efficient way of using Spark, which is based on Spark data frames and UDF, Pandas UDF, and Windows functions.
So I’m actually at the end of the presentation. What I would like to say, so as last slide, I would like to thank you, the core team. We have many people, many contributors I would like also to thank. They were jumping in at the project and jumping out, so there were many people contributing. Our core team is now what you see in the central part. And thank you also to all the product owners, they had to wait quite a lot of time to get the project done. Thanks actually for trusting us. And a special thanks goes actually to Martin Studer from Mirai Solutions, he was and he’s still our Spark guru who introduced us to the Spark world and pushed actually the project to the action status. So thank you for listening. I hope that was interesting for you and I wish you all a great day. Bye.

Sahand Razzaghi

Sahand is a Quantitative Developer, with a focus on Python and Spark based parallel computing for financial applications. He started his career as a Trainee at SCOR. After spending a few years in the ...
Read more

Luca Valentino

Luca is an a Quant Engineer with over 15 years of experience in quantitative/actuarial software development in finance, expertise in data strategy, data engineering and architecture. He is currently w...
Read more