Forecasting arrivals from Ukraine

--

If the past is a foreign country, the future must be an alternative dimension. Humans have obsessed about predicting what may come to pass, probably for as long as we’ve been human.

Accurate predictions are hard to make — and they’re especially hard in complex situations like the Ukraine conflict, where many uncertainties abound. But without a sense of how things might play out, it’s tricky to plan humanitarian responses such as cash assistance, psychosocial support, and other forms of practical and emotional help we’re providing for people arriving in the UK from Ukraine.

The only attempt at forecasting arrivals that I had seen was a fairly simple projection produced by the Scottish Government, looking at their ‘super sponsor’ scheme. But the British Red Cross needs to anticipate how many people might arrive — up to six months in the future — across all UK nations and for all of the visa schemes. A simple projection wouldn’t cut the mustard.

Instead, we’ve developed a sophisticated and novel approach that simulates the entire visa process, from the initial application through to arriving in the UK. And our simulation is accurate: we predicted a total of 119,478 people would have arrived as of 23rd August; 119,119 did. This is a difference of only 359 people.

This post explains how we did it.

We’ve also made our predictions — as well as the code for generating them — publicly available.

Simulating the visa process

The British Government is running three concurrent routes to the UK for people fleeing Ukraine:

  1. Ukraine Family Scheme
  2. Homes for Ukraine (individual sponsorship scheme)
  3. Government ‘super sponsor’ schemes in Wales and Scotland

Regardless of the scheme, people go through a similar process before arriving in the UK: they apply for a visa; the application gets processed, most often resulting in a visa being issued; then the visa-holder travels to the UK.

The diagram below shows how we’ve modeled the flow of applications to arrivals, which revolves around two backlogs: one for applications yet to be processed; and one for issued visas where the visa-holder is yet to arrive.

Visa application backlog

Each week, visas are issued, refused, or withdrawn — all of which shrink the application backlog — and new applications are submitted, which adds to the backlog. Based on the additional number of visas issued each week and the size of the application backlog in a given week, we can calculate the proportion of the application backlog that results in a visa being issued.

Application processing rates

At the time of writing, the Family and Sponsorship schemes have exactly the same average rate of converting applications into visas each week: 14.4%.

DLUHC hasn’t published a lot of historical data for this part of the process, so for simplicity we combine them into a single rate of visas being issued from the backlog of applications.

Refusal and withdrawal rates for applications have remained low: around 5% for the Family Scheme and 2% of applications to the Sponsorship Scheme. We assume these rates remain constant in the simulation.

Backlog of issued visas

There is also a backlog of visas that have been issued but the visa-holders are yet to arrive in the UK. This is calculated as the number of additional arrivals in a given week divided by the previous week’s number of issued visas that hadn’t yet arrived. This assumes people will generally arrive in the UK around a week after their visa has been issued, which may not always be the case, but is good enough for our purposes.

Arrival rates

The weekly proportion of people arriving in the UK from the issued-visas backlog fluctuates within and between schemes. We fit linear regressions (one per visa scheme) to predict arrival rates and use these predicted rates for future, simulated weeks.

The trend line for the ‘super sponsor’ schemes is a bit less certain (and closely tracks the Family Scheme) so we just use the Ukraine Family Scheme’s predicted arrival rates for the Government schemes.

New visa applications

Although the number of new visa applications submitted each week fluctuates, the overall trend has been a downwards on the Family and Sponsorship schemes. For the baseline scenario, we fit linear regressions (one for each scheme) and use the out-of-sample predicted values for the future, simulated weeks.

Wales and Scotland paused new applications to their ‘super sponsor’ schemes on 10 June and 13 July, respectively, so we assume there will be no new applicants through that route.

Simulating arrivals

Now we have all the core components:

  • The backlog of issued visas, where the visa-holder hasn’t yet arrived in the UK
  • The weekly rate of arrivals from the issued visas backlog
  • The backlog of visa applications that need to be processed
  • The weekly rate of visas issued from the application backlog
  • The rate of new applications submitted

From these five components, we can simulate the number of new arrivals each week. We seed the simulation with the latest backlogs for each visa scheme taken from the most recently available data. The simulation starts at the most recent week, and we can run it for as many weeks as we want.

The algorithm for the simulation looks like this:

Loop over each week of the simulation:
├── Calculate the number of new weekly arrivals under each scheme from last week's backlog of issued visas
├── Remove these new arrivals from the issued-visa backlog Calculate the number of new applications to the family and individual sponsorship schemes, based on historical trends
├── Remove the (small) proportion of applications that will be refused or withdrawn
├── Add the remainder to the backlog of applications
├── Convert a proportion of the previous week's applications into issued visas
├── Add these new visas to the issued-visas backlog
└── Remove newly issued visas from the backlog of applications

Does it work?

We tested the simulation in two ways:

  1. Back-testing: Checking how well its predictions match previously observed numbers of arrivals.
  2. Forward-testing: Seeing how accurately it predicts future arrivals.

Back-testing

We set the simulation to start at week 20 (16th May 2022) and predicted numbers of arrivals over a three-month period, up to 15th August 2022. As you can see in the graph below, our simulation nicely predicts the observed arrivals. The total number of actual arrivals are towards the upper end of the predicted range; this is most likely because the simulation uses rates of applications and arrivals that decline faster than what was happening three months ago.

Forward-testing

We also generated predictions for the total number of arrivals that would be reported a week later — and we did this for two weeks in a row.

In advance of the official statistics published by DLUHC, we predicted 116,486 people would have arrived in the UK by 15th August; 116,541 did. This is a difference of only 55 people — an astonishing level of accuracy.

The week after, we predicted a total of 119,478 people would have arrived (as of 23rd August); 119,119 did. This is a difference of only 359 people.

--

--

Matthew Gwynfryn Thomas
Insight and Improvement at British Red Cross

Anthropologist, analyst, writer. Humans confuse me; I study them with science and stories.