June 25, 2020 05:00 PM PT
Learning to hash has been widely adopted as a solution to approximate nearest neighbor search for large-scale data retrieval in many applications. Applying deep architectures to learning to hash has recently gained increasing attention due to its computational efficiency and retrieval quality. However, existing deep architectures are not fully suitable to properly handle 'sequential behavior data', which are types of data observed in many application scenarios related to user modeling. We believe that in order to learn binary hashing for sequential behavior data, it is important to capture the user's evolving preference or exploit the user's activity patterns at different time scales. In this work, we propose a deep learning-based architecture to learn binary hashing for sequential behavior data. The proposed framework utilizes Spark platform for large scale data preprocessing, modeling and inference. We also describe how the distributed inference job is performed on Databricks with Pandas UDF.