Why (and how) we collaborate with academia

2.18.21 / 7 min read

Signal AI always pushes innovation to provide the best value we can to our clients and to expand our differentiation and defensibility. One of the approaches that we use to improve both these aspects is to explore novel methods and ideas about how data science can be used to solve our clients’ challenges and, since the creation of the company, we have believed that collaborations with academic partners is fundamental to support these efforts.

Signal AI has a legacy of university collaborations. As David Benigson (Co-founder and CEO) was starting to shape the idea that would form the company, a Knowledge Transfer Partnership (KTP) was initiated with Professor Udo Kruschwitz from the University of Essex. This allowed Miguel Martinez (soon to be co-founder of Signal AI) to work in the project while finishing his PhD at Queen Mary. The KTP is a UK government initiative aimed to bridge the gap between industrial applications and academic knowledge. At Signal AI, we have a similar philosophy. We believe that, in order for data science to have the biggest impact, we should effectively understand and translate academic knowledge in a commercial setting. This has been one of the pillars of the company since its conception and our collaborations with universities and researchers is one of the approaches to achieve this goal.

Before exploring why we collaborate with universities, it’s important to understand how we are currently structured internally. All the data scientists in Signal AI are embedded into cross-functional teams. These teams comprise a small number of people (usually between 4-8) with a shared goal and the combination of skills needed to solve such problems (e.g., engineering, UX or data science). Additionally, we also want all data scientists to share their learnings and to coordinate any activities that are common or affect the function as a whole. Such activities include reading groups, broader discussion on our research approaches to problems and common infrastructure challenges, as well as planning and coordinating collaborations with universities.

The benefits of University collaborations

There are several reasons why collaborations with universities, if handled effectively, are a powerful tool for any innovative business. For Signal AI, the major drivers are the access to specialised knowledge, increased employee happiness and company brand. Over the last seven years we have hosted more than 30 visiting researchers and master students exploring research lines with Signal AI, all of whom were able to deliver impact and enhance their career development. As a researcher, you focus on very specific problems and a common frustration at that stage is that this research might not generate any impact nor solve any real problems. By collaborating with a company like Signal AI, researchers see the impact of their work with real users. In addition to impact, early-stage researchers such as PhD students, are very keen to join a company doing applied research in order to understand this professional path and develop important skills, such as communication and collaboration with non-technical people.

Specialised knowledge

Signal AI is solving a very complex problem where, in order to provide our users the best possible service, we need to address a large number of research problems within the areas of Natural Language Processing (NLP) and Information Retrieval (IR). These challenges include, but are not limited to, deduplication, entity linking, topic classification, sentiment analysis or quotation detection. All this while having a relatively small team of in-house data scientists. Even if most of our research relates to NLP, it is impossible to have all the specialised and up-to-date knowledge for all of these tasks. In general, even for companies working on less complex environments (e.g, focusing on a very specific task), collaborations with universities could play an important role by allowing exploratory research on the most cutting-edge alternatives on how to solve a problem. This also indirectly allows the company to influence the community to explore and debate specific challenges in the space (e.g., refocus on the importance of news data or push for reproducibility and replicability) and influence the debate about future challenges.

Employee acquisition and retention

Keeping in contact with academia is very important for many of the data scientists in Signal AI, not only from a professional point of view, but also from a personal one. Giving back to the community, in one way or another, is a powerful driver for many of us. This tends to be especially important for people who either come from a research background or want to grow their reputation as researchers. In both cases, maintaining the connection to the academic community over time is something critical in order to stay up-to-date and/or continue to contribute to recent developments in their fields. In addition, being involved in conference organisations and having a consistent publication record could be integral for data scientists who want to retain the option to go back to academia after working in industry. By encouraging academic collaborations, not only is it easier to attract researchers, but the general employee happiness in the research function as a whole will increase as a result.

Company brand

Many companies are positioned as AI-first companies. However, research by MMC (note: MMC is one of Signal AI’s investors) shows that in 2019, 40% of the startups in Europe claiming to use AI showed no sign of it. One of Signal AI’s main differentiations and defensibility is how we apply AI to solve the challenges faced by our clients and we have a well established reputation, at least in the NLP space, and one of the ways we accomplished this was by being involved both in the academic and practitioners communities. By doing so, our reputation preceded us, either because people were aware of the conferences we were taking part of or because of the network effect in the community.

How Signal AI collaborates with academia

Signal AI started its Visiting Research program shortly after being funded, where early-career researchers (usually PhD students) will join the company to work on a specific research line for 3-12 months. Each case has specific goals but they always suit at least one of the these two:

  • Improve a component in the Signal AI ecosystem (e.g., our sentiment analysis)
  • Reduce the uncertainty and study the feasibility of a new method or idea

In both cases, we encourage the visiting researcher to publish any research findings that we can share publicly with the community. In fact, we have managed to publish at least one publication a year since 2015 and all of these were results of academic collaborations.

In addition to the visiting research program, the data scientists in Signal AI are active in the research community (especially in the field of Information Retrieval) and several of us are involved in the organisation of some of the main conferences in the field. We have also sponsored some conferences and initiatives. For instance, we have created the Signal AI Industry Impact award in the ECIR community in order to recognise those accepted papers in the conference that have the highest potential from the industrial point of view. Another important way to interact with the community is to release realistic datasets. This is a critical aspect because the academic community is eager to find new data while industry can benefit from the community working on their own data. For this reason we released our 1M Signal dataset that has been now downloaded more than 6,000 times. Both the Signal AI award and the release of datasets are examples of giving back to the community while influencing it and increasing our brand visibility. The beauty of this model is that everyone benefits from it.

Learnings and Reflection

Collaborating with universities has been very useful for Signal, but it has had challenges at different stages of growth, usually related to two high level aspects: Priority and Impact.

Priority

One of the main challenges when using collaborations to improve current lines of work is the differences in time horizons. It usually takes a few months from the moment a research collaboration is defined until we have a visiting researcher in the company. This might not seem like a long time in academia, but it is a long period for a start-up, and even for a scale-up. During this time, the priorities of the company might have changed enough to make the project less interesting or even obsolete. In order to address this challenge, we now explicitly scope each collaboration either as an improvement in an existing component or as an exploration where the main goal is to reduce uncertainty (e.g., How well can we summarise events based on our content?).

Impact

A common challenge for data scientists is that we tend to focus on academic-focused metrics. Unfortunately, pure-quality metrics (e.g., F1 for classification) might not correlate well with the business metric that they are targeting. In order to minimise this risk, projects should be very explicit on their assumption, hypothesis and their expected impact on user behaviour or value. The best approach is to start with the high level impact that we want to achieve from the user point of view (e.g., spend more time in the app) and then cascade down to all the potential solutions that we believe will help with that goal (e.g., better results in a search engine). One aspect that is especially difficult to grasp to some researchers is that small improvements could have zero impact on the high level goal, while potentially requiring significant effort. There are exceptions to the rule, usually related to problems that are clearly monetizable (e.g., a 0.05% better fraud detection system in a company predicting millions of transactions a day) but in many other cases improvements that are not clearly noticeable by the users might have a limited impact on changing their behaviour.

It is also important to ensure we select candidates that can operate in an industrial setting. This ranges from communication skills to development skills and the perspective that efficiency, simplicity and feasibility of integration given the current infrastructure are also important aspects when considering potential solutions. Obviously, Signal AI tries to always provide as much infrastructure support, data and supervision for the researchers to be as impactful as possible.

Conclusions

Some of the best research talent is still at universities, where some individuals are wondering about the impact of their research. On the other hand, companies are closer to their customer problems. We truly believe that bridging the gap between academia and industry as much as possible is a win-win situation for all the parties involved and while it is not an easy journey and there are several challenges from the planning and organisation point of view, it is a fantastic opportunity for companies, especially for SMEs, to improve their offerings in new innovative ways.