If you leave data unsecured, don’t be surprised if people steal it

Taylor Armerding
5 min readAug 10, 2020


Photo by Franki Chamaki on Unsplash

Talk about having a target on your back, your front and every available side. “Data enrichment” companies are red meat for cybercriminals.

As the label implies, data enrichment businesses don’t simply store data, they merge data stored by other companies with third-party data to help them target existing and potential customers more effectively.

So, in most cases, the data include much more than an email address, home address, phone number and what a customer bought last month. It’s also the number of children in a household, the value and location of the house, political donations, memberships, social media profiles, income, buying habits, medical conditions, what kinds of cars they drive, etc.

Those are the kinds of data that can enrich organizations looking to improve their targeted marketing techniques. They also offer the potential of major enrichment to cybercriminals looking to steal identities.

But sometimes those criminals don’t even have to break a sweat pounding on a keyboard to hack into a database. The data are served up free. Which is what happened last fall when security researchers Bob Diachenko and Vinny Troia discovered an open Elasticsearch server containing more than 4TB of data on more than 1.2 billion people.

Elasticsearch is an open source search engine that allows users to index and search unstructured data. Troia, chief of threat intelligence at Data Viper, told Infosecurity magazine that the server “was unprotected and accessible via web browser at No password or authentication of any kind was needed to access or download all of the data.”

Then at the end of December, Diachenko, a security researcher at Comparitech and owner of the SecurityDiscovery research blog, led a team that discovered five Elasticsearch servers containing Microsoft customer service and support records on 250 million customers, going back 14 years, that were easily accessible to anyone with a web browser.

While some of the information in those records, like payment data, was redacted, there were also plain-text data that included email addresses, IP addresses, customer locations, and descriptions of customer service and support claims and cases.

The exposed information also included the email addresses of Microsoft support agents, case numbers, resolutions and remarks, and even confidential internal notes.

Diachenko’s team notified Microsoft, and the servers were secured within 24 hours.

Making it easy for criminals

But the value of that kind of information to scammers is obvious. Thieves could pose as Microsoft tech support agents using authentic emails plus credible information on their potential victims’ customer support records.

There’s more. More recently, this past May, Diachenko set up a honeypot server — an unsecured, public internet-facing database filled with fake user data — and got his first hit within eight and a half hours. Over the next five days the honeypot was attacked 36 times, and the attacks spiked after the database was indexed by Shodan.io, an Internet of Things (IoT) search engine. Diachenko reported 22 attacks in the next 24 hours and a total of 175 before he shut it down after 11 days.

But while Elasticsearch shows up in multiple instances of these massive exposures, the company says the problem is not with security vulnerabilities in its product but in the way customers are configuring it — or misconfiguring — it.

Indeed, Mike Paquette, security product director at Elasticsearch, told Infosecurity that its default setting binds Elasticsearch only to local addresses. That means if users want to communicate outside the local machine, they have to change the settings. The exposures, he said, “usually involve instances where individuals or organizations have actively configured their installations to allow unauthorized and authenticated users to access their data over the internet.”

It sounds a bit like leaving the door to your house open or unlocked for some friends, but then never closing it, allowing anyone who sees the entrance to walk right in. “The crude analogy is leaving the keys to your house under the welcome mat with a sign pointing to it,” said Ameesh Divatia, CEO of advanced data protection company Baffle.

There is a wrinkle that may confuse users, however. As is the case with numerous online products, there is a free and a paid version of Elasticsearch. The paid version has what is called “X-Pack” security features. The free version includes X-Pack only during a trial period.

Be a responsible user

That doesn’t mean users of the free version get nothing or are helpless to implement security measures. Amazon provides detailed instructions on IAM [Identity and Access Management] for those using its Elasticsearch Service. “(Y)ou can configure your Amazon ES domains so that only trusted users and applications can access them,” says a blog post on the topic. But the key word there, of course, is “you.” Not Amazon. You, the user.

Elasticsearch also offers a measure of security in its free version. The company announced a year ago in June that its newest versions of Elastic Stack would provide “core security features” for free.

They include:

  • TLS [transport security layer] for encrypted communications.
  • File and native realm for creating and managing users.
  • Role-based access control for controlling user access to cluster APIs and indexes.

However, advanced security features, “from single sign-on and Active Directory/LDAP authentication to field- and document-level security,” still only come with the paid, or “Gold” subscription.

While it’s arguable that good security should be “built in” to every product that will be connected to the internet, the reality is that rigorous security features cost time and money. In the auto industry, you get “core” safety features in any new car, but advanced features cost extra. In the digital realm, there are free and paid versions of anti-virus products — the paid version containing more robust protections.

“Database vendors tend to provide security capabilities for a premium because it does involve some operational overhead for them,” Divatia said. “Some examples are key management support, logging, and access control infrastructure.”

Elasticsearch does offer configuration instructions for its free subscription, covering TLS encryption, authentication, authorization, putting the Stack behind a VPN or firewall, restricted scripts and isolation.

To that, Divatia adds several suggestions:

  • Identify sensitive data in your cloud database environment based on regulatory requirements.
  • Create policies on how that data is protected based on its utility downstream. For example, if that data will never be accessed, it can be masked. If you need dummy data in the same format as the original, use format-preserving encryption or tokenization. To process the data without exposing it, use privacy preserving analytics that encrypt it using the AES [advanced encryption standard] algorithm.
  • Use a solution that can protect all the data with a specialized security tool rather than depend on what the data storage vendor provides.
  • Monitor access to sensitive data at all times to keep an audit trail and detect unauthorized access.

Bottom line: Security is a shared responsibility of both a vendor and a user. And while no security measures are entirely bulletproof, it is possible to make it difficult for attackers to breach your defenses.

In the case of these Elasticsearch exposures, as the researchers found, some users, including even a tech giant like Microsoft, had removed any defenses at all.

Which should amount to the proverbial wake-up call: While there are security tools and processes available that will help protect your assets, they’re worthless if you don’t use them.



Taylor Armerding

I’m a security advocate at the Synopsys Software Integrity Group. I write mainly about software security, data security and privacy.