2016年5月26日星期四

的content v metadata contest 在 the heart of the Investigatory Powers 法案

经过下议院委员会30个小时的辩论和1000项左右的反对党修正案, Investigatory Powers 法案 正在进入报告阶段。现在是时候重新审议该法案中最基本的要点之一:内容和元数据之间的分界线。

鉴于 报告 David Anderson Q.C.将对大国电力的运行情况进行独立审查。显而易见的是,每个功率的分界线都不相同。这与批量拦截和设备干扰特别相关。 



内容和元数据的敏感性

为什么内容和元数据之间的区别很重要? 

政府’在人权法中得到支持的立场是,截取,获取,处理和检查通信的内容比对通信的干扰更大。“谁,何时,何地,如何” contextual 数据 wrapped around it.

其他人认为, 
多亏了移动互联网和智能手机应用, 元数据变得越来越丰富和具有启发性,因此介入性的差异变得不那么明显。从2015年3月2日的议会情报和安全委员会的报告中我们知道,情报机构对元数据的重视程度与内容一样多,甚至更多。 

即便如此,根据该法案,元数据的选择和审查所适用的保障和约束也要少于内容所适用的保障和约束。


内容和元数据分离

在内容和元数据之间划清界限的地方不一定很明显。不可避免的人权审查无法保证,适用《欧洲人权公约》第8条和第10条或《欧盟宪章》的法院将在与国内立法相同的地方划界线。

In fact the 法案 creates different dividing lines between content and metadata for different purposes: 上e version for mandatory retention 和一个cquisition of 通讯数据 from service providers 和一个nother for communications 截取 and 设备干扰. 的latter designates more information as metadata and less as content.

这也许并不完全令人惊讶,因为 安德森评论 (10.28)对内容派生的元数据的有用性表示同情。普遍认为该条例草案可能作出的改变的程度是另一回事。


划界的后果

内容与元数据之间的分界具有重大的实际后果。斯诺登的披露表明,GCHQ已通过数百亿条记录批量拦截和存储了元数据。即使此类“相关通信数据”(《调查权力条例》(RIPA)中使用的术语)是作为针对海外的大批拦截活动的副产品而收集的,该机构也能够查看最终的元数据池有关已知在不列颠群岛的人们的信息。

根据RIPA,如果没有获得部长的特别授权,它就不能这样做。根据该法案,将需要有针对性的检查令。在委员会中,政府拒绝了一项修正案,该修正案将针对性检查令的要求扩展至包括元数据和内容。 

议会是否有足够的信息来知道界线在哪里?

下议院科学技术委员会和a 联合议会委员会 审议了草案草案。似乎都没有人相信它(或其他任何人)理解法律在内容和元数据之间划清界限的地方。 

的Joint Committee identified the definitions of 通讯数据 and content as 上e of the most common concerns among witnesses. Its Recommendation 1 said:

“提出该法案后,议会将需要再次研究这个问题。我们敦促政府与通信服务提供商,监督机构及其他机构进行进一步磋商,以确定是否对那些必须使用这些定义的人足够清楚。”

对于批量拦截,委员会注意到目击者对区别对待的关切。‘相关通讯数据’和内容。它记录了我自己的建议:

“The 首页 Office could usefully produce a comprehensive list of 数据type examples, where appropriate with explanations of context, categorised as to whether the 首页 Office believes that each would be 实体数据, 事件数据, contents of a communication, 数据 capable of being 相关通讯数据 when extracted from the contents of a communication and so 上.”

科学和技术委员会以前曾指出,政府在寻求使该立法适应未来发展的过程中,产生了一些定义,这些定义已导致通信服务提供商和其他方面的极大困惑。它说,诸如‘通讯内容’迫切需要澄清。

的closest that the 首页 Office has come to producing a systematic analysis is in Annex A to 提交联合委员会的证据, categorising a selection of 数据types. This fell too late to be considered by most witnesses and was light 上 analysis of why particular items fell 上 上e side of the line or the other.

自那时候起, the 法案 as introduced into Parliament in 游行 2016 has revised some of the definitions. Most significantly it replaces '相关通讯数据' with ‘secondary 数据’。政府对此进行了解释:

“[表明]这是 broader than 通讯数据. This clarifies the distinction between this type of 数据 and the narrower class of 数据 available under a 通讯数据 authorisation.” (emphasis added)

政府 published 实务守则草案 与条例草案一起。原则上,在这些和其他来源中有大量的解释–解释性说明,内政部证据,情况说明书,业务案例,委员会中的部长声明,内政部致委员会的信等–应该可以帮助我们了解分界线在哪里。

How does the 法案 draw the line?

在内容和元数据之间划界的任何尝试都必须避免循环性:“为什么这些信息不满意?” “因为它不那么敏感。” “为什么此信息不太敏感?” “因为它不满足。”

的Bill's new definition of content (there is no existing definition in 里帕) turns 上 whether 数据 reveals anything of what might reasonably be considered to be the 含义 (if any) of a communication. 的Joint Committee commented 上 the draft 法案:



的impression of having to perform metaphysical gymnastics is bolstered when we are introduced to the concept of ‘inferred 含义’。拦截工作守则草案第2.14段说:

“第223(6)节中列出的内容节的定义有两个例外。首先是解决推断的含义。发送通信时,通信的简单事实传达了一些含义,例如:它可以提供人与人之间或人与服务之间的链接。此异常清楚地表明,与该通信关联的任何通信数据都将保留为通信数据,并且可以从中推断出某些含义的事实并不会使它满足。”

如果有的话,这可以肯定保罗·伯纳尔’s concern that since 含义 can be derived from almost any 数据, a dividing line based 上 the existence of 含义 is problematic.

What is the practical result of the 法案’s definitions? 

Since the 法案 draws the line in different places for different purposes the practical result depends 上 which set of definitions is used. One set applies to 截取 and 设备干扰, the other to retention 和一个cquisition of 通讯数据. 

截取 各种元数据是‘secondary 数据’. 对于 设备干扰 这是相似的‘equipment 数据’。两者都包含‘systems 数据’ or ‘identifying 数据’. Systems 数据 is a critical definition, since S.223(6) lays down that if something is systems 数据 it cannot be content.

系统数据定义的压倒性,使拦截或干扰机构无须处理以下问题:‘meaning’ of the communication. 的draft 拦截 Code of Practice notes that in practice the agency will 上ly have to decide whether information fits within the definition of systems 数据. If so, it cannot be content even if it reveals some of the 含义 of the communication.

该法案还将使‘identifying 数据’ to be extracted from the contents of a communication and treated as 二手数据. Under 里帕, information such as an e-mail address embedded in a web page is treated as content. Under the 法案, intercepting and interfering agencies would be able to scrape such 数据 from the body of a communication and treat it as metadata.

对于 retention 和一个cquisition of 通讯数据 元数据是‘entity 数据’ or ‘events 数据’. Here the position is reversed: content takes precedence. If information reveals anything of the 含义 of the communication (beyond the mere fact or transmission of the communication) then for these purposes it is content, even if for 截取 or 设备干扰 purposes it would be systems 数据. 的‘identifying 数据’抓取例外不适用。

的result is that some types of information may be treated as metadata for the purposes of 截取 and 设备干扰, but as content for the purposes of 通讯数据 retention 和一个cquisition.

内容和元数据的这种重叠不仅仅是理论上的。通信数据业务守则草案建议某些通信可能完全由系统数据组成(因此被视为不包含任何内容)。设备干扰操作规范草案以网络基础设施项目之间的机器对机器消息为例,以使系统能够管理通信流。  

测试内容/元数据分隔线

测试内容和元数据之间分界线的最全面方法是采用大量不同类型信息的示例,并评估它们将落在该行的哪一边。 

我采用了另一种方法:收到一封简短的电子邮件,并评估其哪些组件可以视为内容,哪些可以作为元数据。

在本练习中,我使用了将内容与‘secondary 数据’。这适用于针对性,主题性和批量拦截令。取代‘相关通讯数据’根据RIPA。正如我们所看到的,‘secondary 数据’通常比‘communications 数据’用于强制保留和获取的定义。

这是我的示例电子邮件。



An initial impression is probably that the 从/To and 已发送 fields are metadata and everything else is content. Indeed that is the current position under 里帕. When we turn to the 法案 however, things seem to be rather different. It appears that most of the e-mail may be either systems 数据, or 识别数据 that can be extracted and treated as metadata.

Of course 上ly the visible parts of the e-mail are shown. More 数据types will be lurking in the header. Depending 上 exactly what they contain those are likely to be 二手数据.

To understand how what looks like e-mail content can become metadata, we need to delve more deeply into the definition of 'secondary 数据'.

What is 二手数据?

S.120 of the 法案 provides that 二手数据, in relation to any communication transmitted by means of a telecommunication system, means any 数据 falling within either of two subsections:

第(4)款是 systems 数据 它包含在通信中,作为通信的一部分,附加于通信或在逻辑上与通信相关联(无论是由发送者还是以其他方式)。一般而言,系统数据是使电信系统或服务,保持通信的系统或由该系统提供的服务起作用的数据。它不限于正在传送有关通信的系统。有关系统数据的完整定义的图形表示,请参见 这里.

第(5)款关注 识别数据。像系统数据一样,它必须包含在通信中,作为通信的一部分,附加于通信或与通信逻辑关联(无论是由发送者还是其他方式)。与系统数据不同,它还必须能够与其余通信逻辑分离。并且,如果分开,则不得“揭露任何合理地被认为是通信含义的东西(如果有的话),而不论由于通信事实或与通信传输有关的任何数据而产生的任何含义。”

这最后一个条件反映了条例草案’s general definition of content. It raises the perplexing question of what (and how much) information can be extracted from the content of a communication without revealing anything of the 含义 of the communication. Examples given in the Explanatory Notes include:

  • 日历约会中的会议位置; 
  • 照片信息-例如拍摄的时间/日期和位置;和 
  • 与网页中的“ mailto”地址联系
的first two of these examples reveal a possibly surprising feature of 识别数据. 的数据 can, it seems, relate to matters such as a real world meeting or the taking of a photograph that are not an aspect of a communication.

这个结论来自于‘identifying 数据’,其中包括可用于识别任何人,设备,系统或服务,任何事件或任何人,事件或事物的位置的数据。活动是– apparently - not limited to events forming part of the use of a communications system. Data may relate to the fact of the event, the type, method or pattern of event, or the time or duration of the event. 对于 a graphical representation of the full definition of 识别数据, see 这里.

内政部在向联合委员会提供的证据中说:“也可以从通信内容中提取某些结构化数据类型”. In the 法案 neither the systems 数据 nor 识别数据 definitions appear to be restricted to structured 数据 (and the definition of ‘data’当然不限于这种方式。

标识数据必须能够在逻辑上与通信内容分离。这是否意味着可提取数据中的某些结构元素?这可能仅意味着无需物理隔离。部长在2016年4月12日的法案委员会中说:“For example, if there are email addresses embedded in a webpage, those could be extracted as 识别数据.”

Another conundrum is whether each item of 识别数据 has to be evaluated separately in determining whether it reveals anything of the 含义 of the communication, or whether extracted items of 识别数据 should be considered cumulatively.

为了分析我的示例电子邮件,我已经假设非结构化信息可以用于本法案(无论从技术上来说是否是另一回事)“logically separated” from the rest of the communication; and that extracted elements of 识别数据 are not considered cumulatively. 的se are points 上 which further elucidation would be desirable.

电子邮件样本分析

Below is a marked up version of the e-mail. All the highlighted text could, it seems, be either systems 数据 (yellow) or 识别数据 (orange). 



的“From”, “To” and “Sent” fields fit the definition of systems 数据, as 数据 facilitating the functioning of a telecommunications service. This is unsurprising and corresponds to the existing position under 里帕.

An e-mail 'Subject' line is content. However, as the draft Equipment Interference Code of Practice explains in relation to equipment 数据, elements of the subject line may be capable of being extracted and treated as metadata: “the text in the subject line would not be equipment 数据 (unless separated as 识别数据).”

所以考虑“last night’s 呼叫”. ‘call’ appears to be 识别数据, since it identifies both the fact and type of an event (S.225(2)(b), (3)(a) and (b)). “last night’s”与事件发生时间有关(225(3)(c))。

“Bill” and “Graham”两者都可以识别或可以帮助识别人员(第225(2)(a)条)。

“Meet”, 星期三”and "红狮”所有似乎都是识别数据。“Meet”与事件类型有关(S.225(2)(b),(3)(b)),“Wednesday”到现在为止(225(3)(c))和“Red Lion”到事件的位置(225(2)(c))。这是现实事件而不是通信事件,这一事实似乎并未阻止它识别数据。说明性注释给出了日历约会中会议位置的示例。如果将日历约会中发送的信息与电子邮件中发送的相同信息区别对待,这将很奇怪。

“DM”. It is possible that this is systems 数据, describing something connected with enabling or facilitating the functioning of a telecommunications service. If not, it appears to be 识别数据 as assisting in identifying a service (225(2)(a)).

“@cyberleagle” is probably systems 数据 (there no apparent requirement that the 数据 should relate to means used to send the intercepted communication itself). If not, this is 识别数据.

If this tentative analysis is correct, the 二手数据 (and equipment 数据) provisions of the 法案 would represent a significant change to the existing content/metadata boundary under 里帕. 

Despite all the supporting 法案 materials these provisions still present a challenge to understand. If Parliament is to have a properly informed debate 上 these matters a fully detailed and reasoned 首页 Office explanation of what 数据 falls within each category 以及为什么 会有所帮助。


没意见:

发表评论