Insights for businesses to thrive

10 TB data lake for survey and near-real-time social media data

22 July 2022 | Noor Khan

10 TB data lake for survey and near-real-time social media data

Key Challenges

Our clients need their data to be centralised, structured and organised in order to provide time-efficient results to data queries.

Key Details

Service

Data Engineering

Technology

AWS Athena, Python, CloudWatch, CloudTrail, DynamoDB

Industry

Market Research

Sector

Social Media, Consumer Behaviour, Marketing

Key results

  • Collating large volumes of data including over a billion records of near real time social media data and around 2 million user survey data
  • A centralised location for all types of data from various sources
  • Auto-scaling to ensure the data lake was able to cope with an increase in data load
  • Ingestion of various forms and locations of data
  • Store data in JSON

Global media market research

Providing game-changing insights to the biggest brands in the world

Our client are a global media market research company based in California, USA. They analyse millions of social media conversations and thousands of consumer surveys to understand audience reactions to products and services. Our client provides media market research to businesses allowing them to create targeted, impactful marketing campaigns. They provide insights to some of the biggest companies in the world including YouTube, NFL, Instagram and more.

10 TB data lake for survey and near-real-time social media data - Ardent

Large volumes of variety of data

Making complex data simple

Our client acquires large volumes of data in different formats from multiple sources. They deal with data coming in from three main sources, this includes survey data from Decipher API which vary from provider to provider, real-time data coming in from social media channels such as Twitter, Instagram, Pinterest and Facebook, as well as survey data from SharePoint which comes in different file formats and has data from different locations and clients. The main challenge that our client faced was that they lacked a single repository of data where their data could be stored, organised, and then output with a single viewpoint.

They needed a robust, scalable, and secure data lake that could allow a smooth stream of incoming data to be organised and stored in a structured way while being quick and efficient to data queries from the end-user. The data would be used by the company's data scientists in order to analyse and report on the data which would then be provided insights to the end client.

10 TB data lake for survey and near-real-time social media data - Ardent

1.3 million user survey data and over a billion social media records

Making big data, efficient

Our team of experienced data experts came on board and collated the vast amount of real time social media data consisting of over a billion records and survey data of around 1.3 million users, into a data lake. The data was organised in tables to ensure that it was categorised and structured for it to be useful and be legible when it was queried to gauge insights and understanding.

The project challenge was the amount of data coming in real-time from Twitter. As the data was constant and varied, it was difficult to store the data in an organised, structured way. However, our highly skilled data engineers were able to face the challenge and create dynamic tables, which allowed the data to be checked over on an hourly basis with new categories being added to store data that did not fit within the pre-defined categories.

Ardent ensures that we deliver a solution that will grow with your organisation. Therefore, the data lake for our client was created with scalability in mind to ensure that the data lake could handle an increase in data load.

10 TB data lake for survey and near-real-time social media data - Ardent

Complex data lake optimised for scalability and performance

Future-proof data solution

Our client can effectively provide real-time analytics to their end clients from the vast volumes of data. The single point data capture ensures that the data is understandable for the end-user, allowing quick and efficient data queries. The data can then be used for data science, analysis and reporting and it can also be sent to third parties with structured, organised data pipelines.

Explore our data engineering services or get in touch to find out how we can help you unlock your data potential.


More Success Stories

Ensuring timely data availability for real time mission critical data

Success Story

Monetizing broadcasting data

Media | Media

A market leader, internationally renowned media and broadcasting company Founded in 2002, our client has been around for over two decades and is an internationally known company dealing with broadcasting data for commercial use. With a mission of making high-quality technology and content affordable for everyone, they have established themselves as a market leader. [...]

Read More... from Insights for businesses to thrive

Fine art storage & preservation software

Success Story

Making logistics simple

Logistics | Logistics, Software

Leader logistics software provider Our client is a leading logistics software provider in the UK. With over 3 decades of experience in the industry, they continuously look to innovate with technology. Their range of software products includes a warehouse management system and removal management software. They aim to remove the complexity of software and bring [...]

Read More... from Insights for businesses to thrive

warehouse management automation user-friendly app

Success Story

Three decades of experience in delivering software excellence

Technology | Logistics, Software

Well-established logistics software provider Our client is a software products company providing software to the logistics industry and their main product was administration solution software for removal companies. With almost three decades of experience, our clients are leaders in the removals sector. Since the start, they have gone from strength to strength in becoming a [...]

Read More... from Insights for businesses to thrive

Ardent Insights

Are you ready to take the lead in driving digital transformation?

Are you ready to take the lead in driving digital transformation?

Digital transformation is the process of modernizing and digitating business processes with technology that can offer a plethora of benefits including reducing long-term costs, improving productivity and streamlining processes. Despite the benefits, research by McKinsey & Company has found that around 70% of digital transformation projects fail, largely down to employee resistance. If you are [...]

Read More... from Insights for businesses to thrive

Stateful vs Stateless

Stateful VS Stateless – What’s right for your application?

Protocols and guidelines are at the heart of data engineering and application development, and the data which is sent using network protocols is broadly divided into stateful vs stateless structures – these rules govern how the data has been formatted, how it sent, and how it is received by other devices (such as endpoints, routers, [...]

Read More... from Insights for businesses to thrive

Getting data observability done right - Is Monte Carlo the tool for you (1)

Getting data observability done right – Is Monte Carlo the tool for you?

Data observability is all about the ability to understand, diagnose, and manage the health of your data across multiple tools and throughout the entire lifecycle of the data. Ensuring that you have the right operational monitoring and support to provide 24/7 peace of mind is critical to building and growing your company. [...]

Read More... from Insights for businesses to thrive