Estimating Policy Functions in Payments Systems Using Reinforcement Learning

Castro, Pablo S.; Desai, Ajit; Du, Han; Garratt, Rodney J.; Rivadeneyra, Francisco

doi:10.34989/swp-2021-7

Estimating Policy Functions in Payments Systems Using Reinforcement Learning

Staff Working Paper 2021-7 (English)

Pablo S. Castro, Ajit Desai, Han Du, Rodney J. Garratt, Francisco Rivadeneyra

February 2021

Available as: PDF

High-value payments systems (HVPSs) are used to settle transactions between large financial institutions and are considered the core national financial infrastructure. In this paper, we use machine learning techniques to understand the behaviour of banks participating in the Canadian HVPS. This understanding could help regulators design policies to ensure the safety and efficiency of these systems.

In particular, we want to understand a key decision that participating banks make in the HVPS: how much liquidity they provide at the beginning of the day. Initial liquidity is necessary to process payments but is costly to participants. Yet posting too little risks delaying those payments, which is also costly to the bank. The chosen amount of initial liquidity is a strategic decision, because the bank can use incoming payments from other participants to send their own payments; however, the timing of those incoming payments depends on the amount of liquidity other participants post.

Because this problem is analytically complex, we use reinforcement learning (RL) to estimate the best-response function. Using RL, we avoid modelling the bank's strategies; instead, the RL algorithm learns a strategy through the interaction with the payments system environment. In a simplified setting for which we know the optimal behaviour, we demonstrate that RL techniques can replicate the expected behaviour of participating banks. In more elaborate settings, liquidity decisions are too complex to solve analytically. The RL agents learned to reduce the total cost of processing payments despite having partial knowledge of the environment or the payments flow. Our results show that RL techniques are helpful in understanding the behaviour of participants in payments systems. Future work will explore the possibility of using the estimated RL policies to design more efficient payments systems.

Content Type(s): Staff research, Staff working papers

Research Topic(s): Digital currencies and fintech, Financial institutions, Financial system regulation and policies, Payment clearing and settlement systems

JEL Code(s): A, A1, A12, C, C7, D, D8, D83, E, E4, E42, E5, E58

DOI: https://doi.org/10.34989/swp-2021-7

Corporate governance

Accessibility

Reconciliation Action Plan

Bank of Canada Museum

What is a central bank?

Careers

Corporate governance

Accessibility

Reconciliation Action Plan

Bank of Canada Museum

What is a central bank?

Careers

Featured

Renewing Canada’s monetary policy framework

Retail payments supervision

Committees

Canada Mortgage Bonds

Research and reports

Our next $20 bank note

Economic survey results

Monetary Policy Report—January 2025

People

Info

Press Conference: Policy Rate Announcement – March 2025

Staff economic projections

Estimating Policy Functions in Payments Systems Using Reinforcement Learning