03_vessel_markov(Python)
Loading...

Vessel Tracking - Markov

The benefits of Environmental, Social and Governance (ESG) is well understood across the financial service industry, but the benefits of ESG goes beyond sustainable investments. What recent experience has taught us is that high social values and good governance emerged as key indicators of resilience throughout the COVID-19 pandemic. Large retailers that already use ESG to monitor the performance of their supply chain have been able to leverage this information to better navigate the challenges of global lockdowns, ensuring a constant flow of goods and products to communities. As reported in an article from Harvard Law School Forum on Corporate Governance, [...] companies that invest in [ESG] also benefit from competitive advantages, faster recovery from disruptions.


  • STAGE1: Download vessel AIS tracking data
  • STAGE2: Session data points into trips
  • STAGE3: Journey optimization
  • STAGE4: Predicting vessel destination

antoine.amend@databricks.com

Context

In this notebook, we want to leverage the information we learned earlier by looking at common maritime routes. Specifically, we want to model these routes using historical data in order to simulate maritime traffic across the US. Using big data analytics, port authorities can better regulate inbound traffic and reduce long queues at anchorage, resulting in cost benefits for industry stakeholders and major reduction in carbon emission, Long queues at anchorage being a major safety and environmental issue source. Another application could help sea carriers be more agile and improve their operational resilience by better optimizing routes vs. economic value. As reported on financial times carriers have taught themselves a valuable lesson during COVID-19 pandemic, parking up ships, sending vessels on longer journeys and cancelling hundreds of sailings. Using this framework, a cargo operator can find the shortest path from its actual location to a given destination whilst maximizing its business value.


source

Markov chain have prolific usage in mathematics. They are widely employed in economics, game theory, communication theory, genetics and finance. They arise broadly in statistical specially Bayesian statistics and information-theoretical contexts. When it comes real-world problems, they are used to postulate solutions to study cruise control systems in motor vehicles, queues or lines of customers arriving at an airport, exchange rates of currencies, etc. In this notebook, we introduce the use of Markov Chains to model maritime traffic between US ports as a steady flow traffic. This will set the foundations for extensive modelling in next notebook.

Dependencies

As reported in below cell, we use multiple 3rd party libraries that must be made available across Spark cluster. Assuming you are running this notebook on a Databricks cluster that does not make use of the ML runtime, you can use dbutils.library.installPyPI() utility to install python libraries in that specific notebook context. For java based libraries, or if you are using an ML runtime, please follow these alternative steps to load libraries to your environment.

%python
dbutils.library.installPyPI('networkx')
dbutils.library.installPyPI("keplergl")
dbutils.library.restartPython()
Show result
Show code
%r
install.packages('circlize')

Since we aggregated data points into trips, it become easy to extract the maritime traffic between 2 distinct ports. In a markov context, a port will be defined as a state, and the transition between 2 states will be categorized by a trip, answering questions like "What is the probability of sailing to New York city when originating from Miami?".

%sql
-- We create uniquely identifiable names as we realised port name not being unique (e.g. Portland Maine and Portland Oregon)
CREATE OR REPLACE TEMPORARY VIEW routes AS
SELECT 
  CONCAT(orgPortName, ' [', orgPortId, ']') AS src,
  CONCAT(dstPortName, ' [', dstPortId, ']') AS dst,
  COUNT(1) AS total
FROM esg.cargos_trips 
GROUP BY 
  src, 
  dst;
  
CACHE TABLE routes;
Show result

We can appreciate how global our network is since each transition may lead to further branches / ramifications although we may expect at least 3 disconnected areas (west coast, east coast and great lakes areas). What is the probability of a ship to be in Portland Oregon after 2-3 trips in the west coast area? We use a circos visualization to better understand the 2nd, 3rd, etc. levels of connections.

%r
library(SparkR)
library(circlize)
 
df <- collect(sql("SELECT src, dst, total FROM routes"))
 
from = df[[1]]
to = df[[2]]
values = df[[3]]
 
mat = matrix(0, nrow = length(unique(from)), ncol = length(unique(to)))
rownames(mat) = unique(from)
colnames(mat) = unique(to)
for(i in seq_along(from)) mat[from[i], to[i]] = values[i]
 
grid.col <- setNames(rainbow(length(unlist(dimnames(mat)))), union(rownames(mat), colnames(mat)))
par(mar = c(0, 0, 0, 0), mfrow = c(1, 1))
 
chordDiagram(mat, annotationTrack = "grid", preAllocateTracks = 1, grid.col = grid.col)
circos.trackPlotRegion(track.index = 1, panel.fun = function(x, y) {
  xlim = get.cell.meta.data("xlim")
  ylim = get.cell.meta.data("ylim")
  sector.name = get.cell.meta.data("sector.index")
  circos.text(mean(xlim), ylim[1] + .1, sector.name, facing = "clockwise", niceFacing = TRUE, adj = c(0, 0.5), cex = 0.3)
}, bg.border = NA)
 
circos.clear()
from pyspark.sql import functions as F
import pandas as pd
 
# We ensure we do not have a "terminus port" that will be translated as a state probability of 0
tied_loose_ends = spark.read.table("routes").select(F.col("dst").alias("src")).distinct() \
  .join(spark.read.table("routes"), ["src"], "left_outer") \
  .withColumn("dst", F.when(F.col("dst").isNull(), F.col("src")).otherwise(F.col("dst"))) \
  .withColumn("total", F.when(F.col("total").isNull(), F.lit(1)).otherwise(F.col("total")))
 
# Our state define port at trip 0, the transition is the probability to reach any other port at trip 1
markov_df = tied_loose_ends.toPandas().pivot(index='src', columns='dst', values='total').fillna(0)
 
# Ensure matrix is nxn
index = markov_df.index.union(markov_df.columns)
markov_df = markov_df.reindex(index=index, columns=index, fill_value=0)
 
# normalize to get transition state probability
markov_df = markov_df.div(markov_df.sum(axis=1), axis=0)
transition_matrix = markov_df.to_numpy()
markov_df
Out[4]: