OSINT's origin story

Actually, OSINT (Open Source Intelligence) as such has been used for decades. That is, collecting publicly available data (radio and TV broadcasts, written publications, etc.) to correlate that information and derive specific conclusions or information about someone. This started in World War II and grew during the Cold War in the 1980s.

But it wasn’t until after 9/11, already in the digital age, that the US government decided to create a specific agency for OSINT tasks to keep up to date about events or changes in sentiment that could be relevant to national security. So, since 2004, the OSINT sector has been growing steadily and more and more private companies are taking advantage of the fact that, in the information age, there is a lot of data that can be accessed immediately and publicly via the Internet.

Today, OSINT systems are a valuable tool for intelligence services and law enforcement, as they can provide a lot of information both at the granular level of a single person and at the level of a whole collective.

But it is not only governments that make use of this technology; large corporations also employ the services of OSINT companies to conduct market research before launching their products or to survey the receptiveness of a new service. Or to find out public sentiment, as well as to assess the state of the competition.


Where does OSINT get its data from?

The data sources from which OSINT extracts the required information are very varied and mostly public, as we have already seen. Among the most important are the following:

  • Newspaper articles or digital magazines
  • Media agency or sociological/statistical research reports
  • Books and academic publications
  • Social networks
  • Census data
  • Telephone directories
  • Public financial information
  • Public surveys (both in RRSS and not)
  • Public information, leaked from cyber-attacks
  • Information on software vulnerabilities
  • Registration and domain data
OSINT
 Video and Image Mining Solutions

An example: forums like Discord or similar are not indexed, because they are behind a login. But it is enough to create an account, and this “problem” disappears. The same applies to information sources that require a paid subscription (newspapers, specialized magazines, opinion pollsters, analysts, etc.). All you have to do is pay and you receive quality information, prepared by experts, which can be added to everything else you already may have.

And then there are social networks of any kind. A place where users voluntarily provide their personal information, post their likes and dislikes, and generally show how they relate to others, who they know, what they dislike, and so on.

But the real data source is in the so-called Darknet. Although specific browsers such as Tor and others are required to access this “dark” part of the Internet, there are also search engines and a tremendous source of information for Police or Justice. Or for companies that want to know what really moves in the shadows of the Big Net and open new markets based on the knowledge gained.

“OSINT is intelligence derived from public information and other types of unclassified information with limited public distribution or access.”

All of this is freely accessible on the Internet and, with a little patience, a browser and a good knowledge of Google, can be found without any problems. But there is more. Much more. Approximately 96% of all content on the Internet is not indexed by Google. And Google itself indicates that it has hundreds of billions of indexed pages. In other words: there are trillions of pages that Google does not reach.

OSINT & Big Data

It is a fact that OSINT, as we know it today, owes its existence to Big Data. After all, an OSINT system is nothing more than a huge database capable of linking information, drawing conclusions, cross-referencing data and finding answers to the questions it is asked. And all the information comes, predominantly, from Open Sources. Hence, the sheer amount of data available requires some form of Big Data processing in order to be able to make sense of it in a reasonable timeframe.

One of the advantages of these systems is that they eliminate biased answers, such as those that can be obtained if a survey is commissioned. In the end, the pollster always feels obliged to give an answer that the client is more or less satisfied with. This can introduce bias. In the case of OSINT, however, the algorithms analyze the data in a stone-cold way and give clear answers either for or against the initial premise, but truthful and unbiased in any case.


Current OSINT uses

As with all Big Data technologies, the current uses of OSINT are just in their infancy. As it is used in more and more areas, new applications will emerge. However, right now OSINT is used in cybersecurity, law enforcement and intelligence investigations, legal, insurance, anti-fraud, threat discovery (real, cyber or commercial) and finance.

Passive & Active Data Retrieval

  • OSINT: this is the most common, and in this case the researcher will add the data he has already available to the OSINT system, and the system will return the requested intelligence, combining that information with the information already available in its database and drawing conclusions or relationships.
  • Focused Crawler: in this case the researcher employs a more targeted method to obtain information that, at first glance, may not be available. The information obtained is added to to system to perform the correspondent processes

Is only public information used?

Not always. Although OSINT stands for Open Source, not all OSINT systems use exclusively public information. In many cases, this data set is supplemented with more specific data from different fields. For example, from a company's business history, or from a police biometrics database, etc. All combinations are possible and, in the end, the data set to be used will be the one needed to arrive to the conclusions that asked of the system.