Half way through the GDS Data Science Accelerator programme

Back in March, Liz Harrison wrote about our successful application to the GDS’ Data Science Accelerator Programme and what we were hoping to achieve. I am now over half way through the 12 week programme, and next week I’ll be giving a show and tell to the rest of my cohort. So now seems a good time for an update.

As a quick reminder, I am developing a unified view of the Hackney property market, focusing on rental properties. I am using the Python programming language for the majority of the analysis. This will be delivered via an interactive interface for easy access to those who need it within Hackney Council. We hope that this will help answer questions about houses in multiple occupation (HMOs), rental prices and property condition.

Have I kept to the project brief?

The project brief was extensive, and I was prepared for the fact that some aspects of it would fail. So what has failed? I was interested in looking at Google Street View imagery to see if property features (age, style etc) could be extracted through image processing techniques. It was quickly established that firstly data was not as freely available as it had been and also that the work involved in building a model could probably be an Accelerator project in its own right. This is fine; we can piece together this information from other data sources.

Linking data at the property level

We already hold data that is linked at a property level using the Unique Property Reference Number (UPRN). Through this work we have started to enrich what we currently know about a property by linking in new datasets. For example Energy Performance Certificate data, which can tell us a lot about property condition and size. We will continue to develop this property dataset beyond this project, as other areas of work will benefit, such as improving our understanding of our housing estates.

Identifying unlicensed HMOs

Through parallel work with our Private Sector Housing colleagues, it was always clear that identifying unlicensed HMOs would be a valuable output from this work. Therefore building a predictive model has remained a priority, and I have received excellent guidance from the mentor group.

Topic modelling of free text data

I wanted to undertake text analysis of reports made about housing which could vary from pest control incidents to noise complaints. We hold a lot of free text data within our various council databases, and it felt like there was an opportunity to understand what residents tell us about the most. I’m hopeful that we will be able to visualise this spatially too.

Rental prices

Gaining access to property level rental pricing data has been challenging and the legality of web scraping rental listing sites can be unclear, but aggregated data is available, and so I’ll be making use of this within the model.

A recent piece of work on short term lets in Edinburgh published by the Scottish Parliament Information Centre made use of AirBnB data. This has made us think about the effect of entire properties being let out for the majority of the year on the Hackney rental market.

Developing an interactive interface

We knew from the outset that we would need a way to communicate outputs to users, and therefore a data visualisation would need to be developed. For this to be useful, then as a minimum it would need to be intuitive, interactive and also present data spatially. My mentor has been helping me build such a tool in JavaScript.

Working with others from government

For me, the real strength of the Accelerator programme is the opportunity to work with and receive advice from data scientists from across government, as well as collaborate with peers in similar roles in local government. We all seem to be facing the same challenges.

In addition to help from my colleagues within the Council, I have also received support from the Office for National Statistics and GeoPlace, who have provided access to invaluable data and expertise.

This Accelerator cohort graduates on the 4th July.