Inside the OpenMIND: open source social media data mining and predictive policing

By Beau Hodai
November 11, 2013

Records recently obtained by DBA Press and the Center for Media and Democracy (DBA/CMD) shed light on a technology, OpenMIND, utilized by law enforcement/counter-terrorism fusion center personnel in gathering and analyzing mass amounts of ‘open source intelligence’ derived from the online lives of Americans.Herb dog bw

Jump to Source Materials Archive

According to records obtained by DBA/CMD, this technology, employed by state/regional ‘homeland security’ ‘fusion centers,’ is able to access to password protected sites and is deliberately designed to hide both the presence of inquiring analysts, as well as their subjects of interest.

Furthermore, these technologies serve as vacuums that amass, store and collate vast amounts of data in order to monitor shifts in public opinion and predict the possible future actions of those being monitored.

OpenMIND and Open Source Intelligence

Traditionally, intelligence personnel have referred to information gleaned through news publications, academic journals, governmental or other openly-available sources, as “open source intelligence.” Commensurate to the increasing practice of uploading the details of an individual’s personal life to the World Wide Web, “open source intelligence” has come to serve largely as a euphemism (in the world of ‘homeland security’ practitioners, at least) for information trolled from citizens’ Facebook, Twitter, or other online social media presence.OpenMind functional architecture

One tool utilized by fusion center personnel in gathering and analyzing such ‘intelligence’ en masse is the OpenMIND “open source intelligence harvesting” system produced by Swiss intelligence software corporation, 3i-MIND Technologies GmbH (3i-MIND).

As initially reported by DBA/CMD (“Dissent or Terror: How the Nation’s Counter Terrorism Apparatus, in Partnership With Corporate America, Turned on Occupy Wall Street;” May 20, 2013) in April of 2011 counter terrorism personnel engaged in the Arizona Counter Terrorism Information Center (ACTIC, commonly known as the ‘Arizona fusion center’) purchased an OpenMIND system with $116,500 in U.S. Department of Homeland Security (U.S. DHS) Urban Area Security (UASI) grant funding for an “open source intelligence data mining program.” The lead agency in this particular program was the Tucson Police Department (TPD). According to records, TPD initiated this program and purchased the OpenMIND system (to be operated by TPD Office of Emergency Management and Homeland Security Regional Intelligence Analyst/ACTIC Terrorism Liaison Officer Carmen Rios) in order to facilitate open source intelligence gathering, analysis and sharing between agencies active in ACTIC. Such agencies include TPD, the Pima County Sheriff’s Department, the Phoenix Police Department, the Maricopa County Sheriff’s Office, the Arizona Department of Public Safety, the Federal Bureau of Investigation (FBI), the Transportation Security Administration (TSA), the U.S. DHS Office of Infrastructure Protection and the U.S. DHS Office of Intelligence and Analysis.

As previously reported by DBA/CMD, the OpenMIND system was likely in use as part of the “open source intelligence data mining program” during 2011 and 2012– a time when TPD Regional Intelligence Analyst/ACTIC Terrorism Liaison Officer Rios aided Phoenix Police Department Homeland Defense Bureau/ACTIC “Terrorism Liaison All-Hazards Analyst” Brenda Dowhan in gathering intelligence on Arizona activists engaged in the Occupy Wall Street movement. As has been previously reported by DBA/CMD, this fusion center intelligence gathering relating to activists nationwide was often conducted with absolutely no predicate of suspected criminal activity.

Tucson Police Department destroys records, denies existence of records

On October 9, 2013, DBA/CMD submitted public records requests to both TPD and the Pima County Sheriff’s Department (PCSD), for further records pertaining to the use of the OpenMIND system (including any and all work products created through use of this system), as well as other records under the potentially broader banner of the “open source intelligence data mining program.”

In response to this public records request, TPD Public Information Officer (PIO) Sgt. Maria Hawke stated that TPD had discontinued use of the OpenMIND system at an undisclosed time in 2012. Furthermore, Hawke stated that TPD only retains email records for a period of 30 days before they are destroyed (though it should be restated here that the October 9 public records request did not solely seek email records pertaining to the OpenMIND system, but also sought all work products– of any kind– created using the system). As such, Hawke stated that there are no TPD records in existence of any OpenMIND work product (regardless of the fact that available records do state that a major component of the OpenMIND system is a database used to archive and index materials gathered by the system), or any email communications pertaining to OpenMIND or work performed using the system.

The October 9 records request submitted by DBA/CMD to TPD also sought other email records in possession of Regional Intelligence Analyst Rios pertaining to a number of other “open source intelligence”-related criteria (records pertaining to the mining of Facebook or other social media, for example). Unfortunately, as stated by TPD PIO Hawke, Rios went on leave exactly 30 days prior to the submission of the October 9 public records request– therefore, per TPD email records retention policy, no records of any kind, including emails sent or received, by TPD Regional Analyst Carmen Rios, responsive to the October 9 public records request, exist.

As for PCSD, PCSD Custodian of Records Frank Gonzales stated: “although our department has the capability to utilize this product, it is not conducive to our investigative techniques and we have never utilized this product for investigative purposes. It is recommended that you forward your request to the Tucson Police Department.”

So, while DBA Press would like nothing more than to examine a few instances of actual “open source intelligence data mining” work (let alone examination of individuals/groups targeted by this program) performed by ACTIC personnel using this OpenMIND system, we are unable to do so, as TPD claims to have destroyed all work products (along with all communications pertaining to such work products, or any related subject) created by this system. Nevertheless, records returned to DBA/CMD by TPD per the October 9 public records request do allow for a modicum of insight into the workings of OpenMIND.

The Harvester

According to records obtained by DBA/CMD from TPD, OpenMIND consists of four primary components: 1.) the human intelligence analysts who man the system; 2.) the database (and database management system); 3.) the OpenMIND “Investigator,” and; 4.) the OpenMIND “Harvester.”

According to a powerpoint presentation (an introduction to the OpenMIND Harvester, apparently used during a four day OpenMIND training course for Tucson analysts) the Harvester, acting on analyst-defined search criteria relating to target “entities,” dispatches a number of “crawlers,” identified in records as six distinct crawlers, each assigned a certain set of data/data source/medium: the “site crawler,” the “video crawler,” the “RSS crawler,” the “microblogs crawler,” the “forum crawler,” and the “deep web crawler.” According to records, these crawlers seek out and retrieve “media,” “textual content” and metadata associated with the creation/posting of media, from internet sources. Data/media gathered by the OpenMIND crawlers is then stored in the database for further analysis.

According to records, OpenMIND essentially divides the internet into two seas of data: the “surface web,” and the “deep web.” As such internet sources targeted by the crawlers in the “surface web” category may include some websites, social media sites/forums (such as Facebook), video sites (such as YouTube), some blogs, etc.– in short, a review of available records indicates that “surface web” material is basically the same material as you may locate through a standard search engine (i.e. Google) and access without a password.

It is OpenMIND’s purported ability to troll the second category– the “deep web”– which is perhaps the system’s most intriguing and salable feature.

According to records obtained by DBA/CMD, OpenMIND purports to be able to crawl, and retrieve data (most likely in both the categories of media, textual content and metadata) from, both password protected sites and “blocked URLs” (according to training documents, TPD analysts were trained in matters relating to OpenMIND Harvester queries of “blocked URLs”). Both of these sources of “open source intelligence” that OpenMIND purports to crawl and mine are internet resources that are believed to be protected from the general public by their administrators. In large part, it is these seemingly protected sources of data/intelligence are referred to in the parlance of OpenMIND as “deep web.”

Available records pertaining to OpenMIND delivered to DBA/CMD per the October 9 TPD public records request promote the “deep web” data acquisition abilities of the OpenMIND crawlers heavily, with multiple documents stating that “password protected sites” are a source of “open source intelligence” “crawled” by OpenMIND.

Neither 3i-MIND or TPD personnel responded to multiple requests for additional information following the delivery of records (on November 7) responsive to the October 9 public records request. As such, it is not clear how OpenMIND gains access to password protected, or otherwise blocked, web sites– though, as stated in available records, 3i-MIND claims that OpenMIND data collection methods are legal.

According to records, another interesting feature of the OpenMIND Harvester is the system’s ability to conceal the identity of the agency/analyst on whose behalf it’s crawlers troll the web. Furthermore, records indicate the system’s purported ability to conceal the terms of its searches (i.e. subject names, organizational names, etc.) from internet sources queried (such as administrators of websites trolled by the “crawlers”). As such, the system claims to grant its operators far more privacy than it allows its targets.

The Investigator

According to records obtained by DBA/CMD, intelligence materials retrieved from both “surface” and “deep web” open sources are archived and indexed in the OpenMIND database for analysis by the OpenMIND Investigator, which produces “operational intelligence” products for human intelligence analysts.

According to records, the OpenMIND Investigator “creates a large database of information (documents, posts, tweets, etc.),” which it then structures in order to “identify objects of interest as entities.” [according to records, OpenMIND identifies “entities” as any “specific type of information that represents real-life objects, such as a person, organization, company, weapon, country, telephone number, email address, etc.]. The OpenMIND Investigator then “analyze[s] entities’ appearance and connections between them.” [OpenMIND records materials state that such relationships/connections are determined “based on the proximity of both entities in a single item”– so there you have it; in OpenMIND speak, we may well be referring to Facebook “friends,” comment “likes,” or Twitter “followers,” though the exact meaning of this relationship mapping jargon is unclear. Records do suggest that textual analysis of material found on the Internet, likely using analyst-defined keywords (or other input, such as vehicle registration or banking information), may also be used in this relationship mapping].

As such, according to records obtained by DBA/CMD, the OpenMIND Investigator performs four primary tasks based on human analyst-defined criteria: 1.) “identify topics of interest (i.e. people, organizations, etc.);” 2.) “monitor relevant topics and sites constantly;” 3.) “identify relationships between people, events, etc.” (according to records, this function appears to be accomplished through mass collation of web sites, contact information– such as telephone numbers and email addresses culled from web sources– social media resources, as well as textual analysis of documents extracted from sites crawled); 4.) “follow trends and conduct statistical investigation.”

It is this last item, the identification of trends and the execution of ‘statistical investigations’ by the OpenMIND Investigator, utilizing the masses of data delivered by the OpenMIND Harvester, that is perhaps most interesting. According to records, this analysis results in the monitoring of trends, or shifts, in public (or group) opinion/attitudes– as well as the delivery of analyst alerts pertaining to such trends/shifts in public opinion/attitudes.

Predictive Policing

Unfortunately, available records produced by TPD carry scant clues as to what such trend/public opinion-based operational intelligence products might look like, or what such statistical investigations might consist of when shat out by OpenMIND on the desk of some Facebook-obsessed ‘intelligence analyst.’ However, for a somewhat deeper view into this, we do have some ‘open source intelligence’ of our own, trolled from 3i-MIND’s own website (http://www.3i-MIND.com/om_use_scenarios).

The following is a scenario of hypothetical law enforcement agency use of OpenMIND in the realm of “predictive” law enforcement– a strategy, championed increasingly in the world of law enforcement/counter terrorism, that relies heavily on analytic intelligence products such as OpenMIND (and, for what it’s worth– as a further piece of supportive open source intelligence– 3i-MIND’s Twitter account bills the corporation as “a leader in predictive, integrated intelligence”):

“Perhaps you are tracking an upcoming political rally. Once you set up the OpenMIND™ system to profile and monitor the rally, it will search the web for the event on web pages, social networking sites, blogs, forums and so forth, looking for information about the nature of the rally (e.g. peaceful, violent, participant demographics), try to identify both online and physical world activist leaders and collect information about them, monitor the event in real-time and alert you on user-defined critical developments.

“Several days prior to the event, you start to receive alerts indicating increasing in traffic and recurrence of indicative words in several of the forums that you are monitoring.

“Different visualizations of the rally including trend analysis of topics, individuals, locations, etc.; related geographical locations that highlight location-based patterns; discussions; leaders’ relationship map, etc. reveal that the forum users are planning to turn the political rally into a violent protest.

“Your insight is distributed to the local police force warning them that the political rally may turn violent and potentially thwarting the violence before it occurs.”

This entry was posted in Criminal Justice, Domestic Surveillance, Federal Bureau of Investigation, Fusion Centers, Homeland Security, Jiffy Squid, Occupy Wall Street, Private Defense/Intelligence Industries, Public records, Source Materials Archive, Surveillance state and tagged , , , , , , , , , , , , , , , . Bookmark the permalink.