The organisational implications of using employee messaging for data science

This article is a fairly technical take on the organisational challenges of using employee messaging like email, Slack, Microsoft Teams, Skype for Business, Facebook for Work etc. This article is aimed at CTOs, DPOs, People Analytics professionals and vendors, and discusses the operational and compliance challenges of corporate message analysis. The context is primarily based on EU data protection and employee law.

Massive amounts of implicit and explicit corporate knowledge is contained within employee emails and digital messages, and companies are starting to wake up to the benefit of analysing this data. The metadata of emails (who is sending emails to whom, how quickly they respond, what emails are forwarded etc) can give insight into the hidden networks within the business, identifying key staff and teams (sometimes known as Organisational Network Analysis or ONA). The content of emails, when parsed, can contain information about projects, morale, employee satisfaction and even threats to the business through data leakage. There is substantial value to the company in understanding this rich source of data.

Employees, especially those in the EU who are subject to GDPR, have a right to expect that their emails are treated with respect and privacy. There are balancing laws to protect both employer and employee; employees (in the EU) can expect to use their work email for moderate personal communication, and the employer should respect this. The employer, on the other hand, is entitled to monitor worker email for security and other business reasons.

Who is doing this?

A number of companies are already looking at email for valuable content. One of the first email datasets considered for use by data scientists is a corpus of emails from Enron corporation, leaked onto the internet after it was obtained by the Federal Energy Regulatory Commission during its investigation, and now a rich training dataset for data scientists looking to understand the schema and value of email data.

Unsurprisingly the CEO of Enron, Kenneth Lay, mentioned the word ‘meeting’ over 1700 times, and ‘bankruptcy’ around 2300.

A widely published case study from Genpact (a subsidiary of GE) working in collaboration with researchers from MIT analysed their own corpus of emails for data which could correlate with corporate performance. The results claim that they can statistically prove that certain types of communication directly correlate to overall business performance. Further, Genpact claim that they can predict “Rockstar” performers within their business with 74% accuracy. Heady stuff, if repeatable outside of their organisation, and indicative of the type of value companies claim from this type of analysis.

In addition to companies looking into their own email, a number of startups have entered the market of messaging analytics. KeenCorp has developed a business model specifically to analyse emails which runs both on-premise and on the cloud. Vibe has a product which directly integrates with Slack to give a meter on team morale. Even Microsoft’s MyAnalytics (included with Office 365) gives employees recommendations on ways to improve their productivity.

Arguably, the companies best placed to operationalise this type of technology are those with embedded data science teams (such as JPMorgan), or existing service providers, such as Microsoft. In these cases, permissions that have already been sought and granted by employees will extend to this new analysis.

For companies seeking to offer this type of analysis as a service (such as KeenCorp and Vibe), the post-Cambridge Analytica and GDPR environment makes it commercially and operationally challenging.

Advice for companies considering accessing email data

Regardless of whether you will be working with your own data or allowing a third party vendor to access your data, it is imperative that you communicate clearly with employees about the nature and value of any analysis of their emails. The UK Information Commissioner’s Office emphasises that employees have an expectation of privacy at work even when they have been informed that workplace monitoring may take place. Respect for employees should be of paramount concern, even over company profitability.

Before allowing any analysis of email corpora, you should conduct a Data Protection Impact Assessment before allowing any access to employee emails in any form.

Analysing your own data

For companies seeking to unlock the value of your own data, ensure that your employee policies are clear and up to date. Where no third party is involved, your use is most likely covered by your right to monitor emails, however you should ensure that employees are informed of any monitoring before it takes place. ACAS has clear and simple guidelines for communication to employees:

Employers should have written policies and procedures in place regarding monitoring at work.
Monitoring shouldn’t be excessive and should be justified.
Staff should be told what information will be recorded and how long it will be kept.
If employers monitor workers by collecting or using information the Data Protection Act will apply.
Information collected through monitoring should be kept secure.

You should ensure that the security controls enforced around your email or messaging infrastructure (eg Exchange, Office 365 or G Suite) extend to the email dataset. Data scientists with access to unredacted/non-anonymised content should be considered privileged actors inside the organisation, and access to the data carefully controlled and monitored.

Allowing access to a third party

If you are considering working with a third party vendor, things become increasingly complex. Email data is commercially sensitive, personally identifiable and private to the employee. Before allowing any access to emails by third parties, CTOs and data protection officers should give considerable thought to the real value being offered by the service.

When I asked this question of a number of other CTOs, I met with a mixed, but always guarded, response. The most positive responses demanded evidence of the value being provided and of the third party’s data security credentials. Some CTOs refused point blank to allow access, concerned over the compliance and operational overhead of dealing with this level of access.

In any situation where the data is to be processed by a third party, a third party data processing agreement must be in place.

As a company considering developing sensitive message analytics products

For most vendors of analytics services, the first challenge will be dealing with compliance and regulatory concerns on behalf of the client. This will often be a greater challenge than the actual analysis of the corpus. As my conversations with other CTOs demonstrated, it’s still easier to refuse access than to deal with the thorny problem of justifying access to email data.

Once theoretical access has been granted, the next challenge will be to operationalise access to emails. A single, one-time analysis will be the most simple to manage with a one-time export of data (or access to APIs) and a single dataset to secure, manage and remove.

Where possible, this analysis should be managed within the client’s own infrastructure. Ensuring that no additional, external services are added to the scope of the existing processing will give the client the most confidence that compliance will be maintained. If possible, providing a tool to existing client technology teams is the preferable route, with the vendor receiving only the output of analysis and not the raw data itself.

Should the analysis service be off-site (potentially using cloud computing services), it’s critical that vendors exceed the compliance and security requirements of the client organisation. In addition, a clear chain of custody for data should be established. Data transmission, receipt and processing should be clearly audited, repeatable and monitored for compliance.

From a legal point of view, the service provider must be bound by a data processing agreement, and it is likely that certain liabilities will be passed to the processor. In addition, the data processor (the vendor) is required to inform the controller of any sub-processors who may be involved in processing the data (for instance, if an additional commercial service is used within the processing pipeline).

Tips for potential vendors

If you are considering offering a service to analyse corporate messaging, here’s a summary of tips that you may find useful, especially when dealing with EU or global clients bound by GDPR.

Exceed your client’s security and compliance policies — this will typically mean that you need information security certifications such as ISO27001
Wherever possible ensure that processing is one time, rather than continuous
Wherever possible ensure that processing takes place within the perimeter of the client organisation — avoid moving data away from company systems (which may require a more consultative, rather than service based, approach)
If possible provide analytics tools to the organisation for their use. Only receive the analysed data from the tools, rather than the raw data to analyse.
Develop a chain of custody for client data
If machine learning is used for analysis, ensure that a way to demonstrate decisions made in the analysis are available to the client. This is especially important for non-rule based approaches (like those in neural networks).