The SocialArk Data Breach Uncovered the Open Source Paradox

February 5, 2021February 5, 2021 Dotan Nahum 8104 Views 1 Comment 8 min read

You’re either secure or you’re vulnerable, either under attack or safe. But is it really that simple? With the recent SocialArk breach, it quickly became clear it wasn’t.

Recently, a cloud misconfiguration by SocialArks exposed 318 million records – more than 400GB of public and private profile data – of 214 million social media users from around the world, to the internet. This breach was initiated by a misconfigured ElasticSearch database owned by Chinese social media management company SocialArks. Their server was found to be exposed to the internet without usernames or passwords to protect the data that it stored.

Over the length of this article we will go over the inherent risks in open source adoption of products like Elastic and other similar tools, a growing concern in a continuously evolving space that becomes more and more complex with time. We are bringing to light processes like “shit left” security and the integration of safety nets as the main solutions to begin a security by design approach to face the new challenges.

Regardless of the specific case with SocialArk, it’s important to understand that the digital data landscape is under constant attack. Data breaches and the vulnerabilities that cause them are increasing every year. In fact, according to MarketWatch, data breaches grew by 17% YoY in 2018-19, and caused an average loss of nearly $8.2M for U.S. businesses, according to IBM’s 2019 Cost of a Data Breach Report.

However, the SocialArk breach was not just another data breach both in its sophistication and its impact. The breach was initiated through a vulnerability that mistakenly exposed data stored in ElasticSearch instances. If you’re a savvy cybersecurity expert this should ring a bell for you. ElasticSearch vulnerabilities have been the source of several cyber attacks in recent years. Even as recently as March 2020, when a UK-based security firm had its own Elastic instance exposed and data breached in the same way.

If you’re not familiar with ElasticSearch, it’s an open source software that indexes and allows for searching various kinds of data. Elastic is so effective at searching and scaling that it can quickly become the focal point for unstructured data storage and management. The data stored with Elastic Search can be anything from indexed documents to personal data, customer sensitive information and more. Naturally, ElasticSearch makes for a valuable and sought-after target for hackers.

Elastic, the company behind ElasticSearch, was recently valued at around $3 billion. While they do everything they can to promote secure usage of ElasticSearch, it boils down to the individual adopter or team. Eventually, it is up to the user to securely manage their Elastic instance. It is not uncommon in the high paced environments, common with R&D teams, that security is overlooked or misunderstood by misleading default configurations in open source software, such as ElasticSearch.

The Paradox of Open Source Software

We cannot blame open source products for security breaches, because well, they’re actually open source projects. They’re someone’s brainchild or hobby and they’re free — use it at your own risk. If you misuse it or misconfigure it, of course, the responsibility is only yours.

In other words, there is a fundamental issue in how developers build, perceive, and consume open source software.

Most open source software is built with one major community goal (other than providing value): “adopt me!”.

Most open source developers will focus on building open source software using technologies that are relevant, recent, and mature. They provide great documentation, including instructions and examples ready for adoption.

This focus on creating frictionless adoption for open source products does frequently mean that security must fall between the cracks. A security-aware organization should have a security model in place prior to adoption. However, despite cyber attacks, like the SocialArk data breach, targeting open source vulnerabilities, many open source projects are still more concerned with fast and effortless adoption than they are with cyber security. Therefore, we can expect to witness more attacks originate from open source vulnerabilities, such as no authentication and exposed keys and secrets.

For example, one simple, yet critical, vulnerability, known as binding to world, occurs at the point the software is brought up. It exposes its interface to the world, and accepts connections from anyone that knows how to locate it across the web (Tools like Shodan scan the entire internet for these connections). It’s very likely that the Elastic instance at SocialArk was discovered this way, and allowed the attackers to find out they could access the data without security restrictions.

Another example of the same kind of vulnerability but in a different open source can be found on Airflow, a popular open source product that enables the creation and management of data pipelines used for machine learning and data processing activities. Airflow, same as Elastic, can presently be seen “binding to world” by default.

Technology is everywhere and risk growth is skyrocketing

Another reason for the recent growth of data breaches, is the inability to gain a clear picture of all live assets and associated risk across the entire organization. The move to microservices architectures is, in large part, responsible. Microservices contribute to productivity and scale for large enterprises. However, the microservice architecture distributes and fragments services and resources, which in turn creates a fragmented infrastructure.

When employing this type of infrastructure, organizations need to maintain several storage services. Oftentimes, every microservice may get its own Elastic instance or Postgres instance, and as the architecture evolves, the use of these products and their placement around the cloud does it as well. In a microservice infrastructure, where you need to maintain several services simultaneously, it’s easy to lose track of the security of each service, especially as different services and storage systems, such as Elastic, often require independent teams with different development skill sets and security “know-how” to manage them.

Lastly, a side effect of the growing popularity of the “gig economy”, and hiring external vendors to manage R&D processes, is that technology silos are out of sight for the CISO and her team. The pandemic and WFH have exacerbated this phenomenon and, just like human resources are increasingly dispersed, so is the technology they build and maintain.

As software grows, so do hacking tools

Tools like Shodan, though created for security purposes, are now very accessible and cheap for anyone to use. What was previously a part of a hacker group’s secret sauce is now simply a service you can buy.

This availability, combined with the motive of grabbing huge silos of valuable data exposed to the Internet, creates a mass of hacking activities focused on finding and exploiting misconfigured software, including, but not limited to, ElasticSearch.

For example, to find such an ElasticSearch instance exposed, you only need to perform a Shodan scan for port 9200 or 9300 and try your luck. Hackers also use strong encryption, allowing them to cover their tracks easily, making it even simpler to get away with the “perfect crime”. All the motivation a hacker needs.

Mitigating the problem: secure by design

Software as a profession is still evolving. We know how to build quality software, scalable software, and how to be great at reusing software. The concept of security is still considered not part of the software design, rather as part of the software “approval” process (For example, QA)

To be able to build secure products, we need to move from validating software after-the-fact to having security by design: specifically, developers and architects being mindful and including security modeling into every other aspect of software modeling

This includes shift-left automated security testing of code, using specialized misconfiguration security scanners, reviewing and getting up to speed with the specific software configuration and assets, as well as early risk modeling and attack vectors.

Mitigating the problem: safety nets

There’s a common practice where you build the best security you can for your organization – and then imagine waking up one day and realizing it all disappeared. In this situation, how do you still make sure your organization is safe?

It’s wise to build a set of safety nets, and it starts with understanding your security posture and how hackers think. Combining these two understandings, here is a non-exhaustive list of some of the activities you can do:

Prepare a mapping of your world-facing services, DNS entries, external services, and validate your security controls that are public
Use purpose-built monitoring services that look for “openings” in your public network
Get to know the tools hacker use, like Shodan, and use them yourself on your very own network
Employ continuous monitoring and assessment, using tools like Shodan externally, but also continuously scan your open source products, that you use for vulnerabilities, as close to the development stage as possible
Build your own toolbox for reconnaissance and OSINT/threat intelligence with free security tools like recon-ng and Spiderfoot

Developers are the key

Like many other things with software, developers are the key. Our hope is that as an industry, software engineering can advance to a stage where security is part of the core engineering work. Currently there are standards, best practices, policies and regulations that are great, however, technology moves too quickly for those to be effectively enforced and applied. The key is to be able to create a secure development process, shifting more and more practices, and mainly security tools and platforms “left” to the early development stages.

Dotan Nahum

CEO and Founder of Spectral at Spectral | Website | + posts

Dotan Nahum is the CEO and founder of Spectral, a developer-first cybersecurity company. Dotan is also an experienced software developer and open-source committer since 1999 with over 100 open source projects of which some are used by Fortune 500 companies like Accenture, as well as public companies such as Wix, and others.

His vast experience as CTO at HiredScore and CTO at Como (Conduit Mobile), and later at fintech unicorn Klarna allowed him to promote engineering culture and effectiveness in Israel by scaling products and companies through building resilient distributed systems, leading cybersecurity strategies, big data analytics, and open source. Dotan is also a 4 times tech author (Packt, Leanpub, among others) and a host of 2 podcasts (Reversim and RTCZ.io) dealing with developers and cybersecurity.