%python
dbutils.library.installPyPI('networkx')
dbutils.library.installPyPI("keplergl")
dbutils.library.restartPython()
%sql
-- We create uniquely identifiable names as we realised port name not being unique (e.g. Portland Maine and Portland Oregon)
CREATE OR REPLACE TEMPORARY VIEW routes AS
SELECT
CONCAT(orgPortName, ' [', orgPortId, ']') AS src,
CONCAT(dstPortName, ' [', dstPortId, ']') AS dst,
COUNT(1) AS total
FROM esg.cargos_trips
GROUP BY
src,
dst;
CACHE TABLE routes;
%r
library(SparkR)
library(circlize)
df <- collect(sql("SELECT src, dst, total FROM routes"))
from = df[[1]]
to = df[[2]]
values = df[[3]]
mat = matrix(0, nrow = length(unique(from)), ncol = length(unique(to)))
rownames(mat) = unique(from)
colnames(mat) = unique(to)
for(i in seq_along(from)) mat[from[i], to[i]] = values[i]
grid.col <- setNames(rainbow(length(unlist(dimnames(mat)))), union(rownames(mat), colnames(mat)))
par(mar = c(0, 0, 0, 0), mfrow = c(1, 1))
chordDiagram(mat, annotationTrack = "grid", preAllocateTracks = 1, grid.col = grid.col)
circos.trackPlotRegion(track.index = 1, panel.fun = function(x, y) {
xlim = get.cell.meta.data("xlim")
ylim = get.cell.meta.data("ylim")
sector.name = get.cell.meta.data("sector.index")
circos.text(mean(xlim), ylim[1] + .1, sector.name, facing = "clockwise", niceFacing = TRUE, adj = c(0, 0.5), cex = 0.3)
}, bg.border = NA)
circos.clear()
from pyspark.sql import functions as F
import pandas as pd
# We ensure we do not have a "terminus port" that will be translated as a state probability of 0
tied_loose_ends = spark.read.table("routes").select(F.col("dst").alias("src")).distinct() \
.join(spark.read.table("routes"), ["src"], "left_outer") \
.withColumn("dst", F.when(F.col("dst").isNull(), F.col("src")).otherwise(F.col("dst"))) \
.withColumn("total", F.when(F.col("total").isNull(), F.lit(1)).otherwise(F.col("total")))
# Our state define port at trip 0, the transition is the probability to reach any other port at trip 1
markov_df = tied_loose_ends.toPandas().pivot(index='src', columns='dst', values='total').fillna(0)
# Ensure matrix is nxn
index = markov_df.index.union(markov_df.columns)
markov_df = markov_df.reindex(index=index, columns=index, fill_value=0)
# normalize to get transition state probability
markov_df = markov_df.div(markov_df.sum(axis=1), axis=0)
transition_matrix = markov_df.to_numpy()
markov_df
Out[4]:
Vessel Tracking - Markov
The benefits of Environmental, Social and Governance (ESG) is well understood across the financial service industry, but the benefits of ESG goes beyond sustainable investments. What recent experience has taught us is that high social values and good governance emerged as key indicators of resilience throughout the COVID-19 pandemic. Large retailers that already use ESG to monitor the performance of their supply chain have been able to leverage this information to better navigate the challenges of global lockdowns, ensuring a constant flow of goods and products to communities. As reported in an article from Harvard Law School Forum on Corporate Governance, [...] companies that invest in [ESG] also benefit from competitive advantages, faster recovery from disruptions.
antoine.amend@databricks.com
Last refresh: Never