2015 Symposium on Impediments to Data Sharing

Nov 22, 2015 12:02am

A.     Overview

The AWPG held a Symposium on Impediments to Data Sharing on November 5th, 2015 in Reston, VA, USA. Following is the original announcement, the slides presented, discussion notes, and planned follow-on activities.

B.     The Original Meeting Announcement

The APWG is continually expanding its efforts to collect and share more cybercrime attack and intelligence data and, in that pursuit, we encounter all manner of impedance to systematic data exchange in law, regulation, conventions and, too often, over-reaching assumptions about their scope. This fall, for example, APWG hopes to conclude some work in this regard with our international partners regarding definitions of personally identifiable information (PII) and to clarify the degree to which machine event data, (i.e., data auto generated by security devices), can be distinguished conclusively from PII.

Clearly, many cybercrime management stakeholders with useful event data do not participate in exchange programs due to some common, non-technical, reasons. Our continued frustration with these rationales has compelled us to host a one-day workshop to confront the known reasons, and identify new ones, for non-participation - and seek their resolution. The day will be formally structured to reduce the likelihood of rat-hole discussions; will have an identified scribe to capture useful information which will be summarized and published; and discussions will be held under the Chatham House Rules.

The APWG Symposium on Policy Impediments to Cybercrime Data Exchange is scheduled for November 5, 2015 at Holiday Inn Washington-Dulles Intl Airport, in Sterling, Virginia near Washington, DC


B.1. The Lack of Clear Definitions

 One of the points of frustration has been the lack of common definitions when discussing a malicious actor or e-crime data. For this workshop we are specifically addressing security-system generated data and artifacts, e.g., the output of anti-virus, intrusion detection, firewall, anti-fraud, or flow correlation systems, that are not normally processing personal data.


B.2. Currently Identified Issues Impacting e-crime Data Sharing:

 Although most every organization will TAKE data from the APWG or other data clearinghouses, many will not SHARE e-crime or malicious actor data with us for some common reasons, usually a perceived risk or potent liability. These reasons we have heard include:

1. Proprietary Data

The data may include some proprietary or internal enterprise information. The concern may be reduced if definitions of specific data items to be shared could be developed, reducing the likelihood of extra data inclusion in the shared data.

2. PII

The data may include personally identifiable information (PII), or some data that could meet the definition of PII. IP addresses in violation of some privacy regulation are the overriding concern. Could the concern be reduced, much like #1 above, with concise operational and process definitions?

3. Liability

The speed of malicious activity detection leads to some amount of false positives being shared. Although we are not aware of any case law that supports it, some are concerned that falsely identifying a party, i.e., reporting a false positive, will incur litigation from the (mis)identified party. We hope that a fast, robust mechanism to remove false positives will reduce or conclusively mitigate the litigation risk.


A number of people have expressed concern about sharing data that ends up at a government entity and then is publicly released via a FOIA request. In our case, the data would have come from the APWG or another data clearinghouse and not have the actual source labeled. And FOIA requests to APWG won't go far. Short of getting a valid exception for e-crime data submissions maybe this is just a great reason to use a data clearinghouse?

5. Lack of tools and assistance

This seems to be the catchall excuse. No one wants to have their people making code or tools to send data. Although the amount of useful tools is increasing we expect this excuse to live on. Since the intent of the workshop is to address policy matters and there are other technical venues for this discussion, we don't expect to spend much time discussing this topic unless we get really ahead of schedule.


C.     Notes from the Symposium

The following are discussion notes from our November 5th Impediment to Data Sharing Symposium. The notes do not represent any person's or organization's official position, as the meeting was held under the Chatham House rule. I do not believe the slides or these notes are sensitive by themselves. If limited sharing of them with your associates would further our goals, feel free to do so.

There were about twenty people in attendance representing the US government, universities, corporate entities, data sharers, and APWG members. A handful were not from the US.

The presented slides in ppx format are here:

A PDF version is here: https://www.coopercain.com/?wpfb_dl=4


C.1. Comments on the Presented Slides

1. An additional reason to share data was introduced. We started with "community protection" (i.e., help my friends) and "to influence some action" (i.e., arrest, shame). Another reason "to drive policy or business decisions" was identified. Crime statistics may be a good example of this new reason.

2. There was agreement that we need to define what "data" means in "data sharing" to allow for more crisp responses to the impediments. The suggested definition of "e-crime event data" may need words to define e-crime and malicious intent.

There was a worry that innocuous, but important, activity such as "I saw you scan me" is not traditional thought of as malicious. We should make sure that we don't constrain the definition too much to remove useful observations.

Later discussion thought that "e-crime" could be defined as a synonym for electronic-crime, which is commonly defined as "any criminal activity involving the use of computers, as the illegal transfer of funds from one account to another or the stealing, changing, or erasing of data in an electronic data bank." (from dictionary.reference.com) although "malicious activity" will always be open to (mis-) interpretation.

 * We will work to craft a better definition.

3. During the "how we share data panel" it was pointed out that we have failed to provide tools for test driving data sharing. Some parties would like to "try and share" before they jump in whole hog but - as a community - we have not made those types of things possible. For example, one could use a test tool to see what one would share if you participated in one of the clearinghouses to convince the boss that it’s good.

4. In the impediments discussion, the FOIA concern was determined to be mostly a red herring. No attendee thought it was worthwhile to dwell on solving this issue.

5. Most of the other impediments did not generate significant discussion. It was pointed out that to convince lawyers that the liability impediment can be reduced may require a short paper with help from a legal professional.

6. A new impediment was raised, generally called privacy and trust concerns.

It was noted that most of the groups that share data - and trust - are person-to-person. An organization may get data from others due to personal connections. If data sharing is going to be efficient and continuing the trust must go from an institution to another institution so when the "person" leaves the organization the data sharing continues. The "trust" may be conveyed via thru agreements/contracts but we need to find a way to get organizations to trust each other.

7. Another new impediment was introduced just before the meeting and then discussed: once shared no-one knows where the data will go.  Will data markings also restrict where data will end up, particularly in government circles?

 *We added these to the impediment list and will investigate further.

8. There was some discussion on dealing with suspected privacy issues. In some areas one can share "PII"-ish data with consent of the data owner.

Should the data submitter be required to agree that "we have right to share data" or maybe a general consent to share before sharing commences?

This devolved into a general talk data ownership and derivative data rights to use the data. One concern is that members "leave" the data sharing community - either through bankruptcy, merger, acquisition, or anger - and what happens to the data that they may "own" that has been shared? Maybe we need a "data sharers agreement" besides the agreement that covers people who "take" data. It could include a right of perpetual use license.


C.2. Post Presentation Discussions

9. At the symposium concluded a question was asked: What are the most attractive data sets we don't share? A quick response was BEC. (Business email compromise) Pat suggested the questionnaire talk to NCFTA as they may be doing something in the area.

10. One continuing comment was the "1594 rule" for data sharing: 1% send and take data, 5% only take data, and 94% don't know what's going on and probably won't understand the data anyway if they took it.

11. Discussion on the next steps did not result in a single great action to be taken. The idea of a few whitepapers for adversary education was proposed. Additional follow-on ideas are still being sought and encouraged.




C.3. The follow-up actions:

A. APWG and friends to craft a better definition of e-crime event data.

B. We have been told that there is keen interest in the APWG cybercrime taxonomy and we should continue work on that, too.

C. It may make sense to craft a short executive-level paper on why data sharing is good, maybe break it out by how different communities - universities, ISACs, governments, corporations - actually share data and the lack of any great calamities over the past ten or so years. The legals seem to worry about great calamities and nobody knows of one in recent data sharing history. There has been some informal work on an op-ed piece, The Case for Structured Cybercrime Event Data Exchange, which is essentially a myth-buster that crisply defuses the claims of data exchange's liability burdens. If we do the latter soon, we could use Obama's directive and CISA as pegs to talk about why these efforts over-reach in that they address problems we may not have exactly and miss the fact that a lot of what they try to do can be satisfied by what is already happening for reasons legislators may not appreciate.

D. The APWG (actually Pat) took an action to lead an effort to write a few white papers examining the identified impediments and why they are not catastrophic. Specific topics or volunteer contributors are welcome.

 *Assistance is always appreciated.


C.4. Other Observations:

1. A number of people suggested that the format of a smallish group trying to discuss - and solve - issues was good and we should try and do more of them.

2. Many participants really enjoyed the "how do we share data" (now) and "how did we get people to share data" (history) part of the morning as many people never get to hear the history and early ideas of some of the successful sharing cooperatives.

3. After the symposium some conversation ensued that led to an idea that we may need a "data submitter's agreement" to cover privacy and ownership issues when data is given to a clearinghouse. Most clearinghouses have a data "takers" agreement to cover what can be done with data received from the clearinghouse; maybe the other direction needs an agreement, too.

4. The APWG has thought of making a lower-volume mail list to continue the discussions. If you feel strongly one way or another please let us know.


I again thank all the attendees for their time and ideas.


Pat Cain
Resident Research Fellow, APWG

Symposium Manager