How big MNC’s like Google, Facebook, Instagram etc. stores, manages and manipulate Thousands of Terabytes of data with High Speed and High Efficiency

Anupam Kumar Thakur
3 min readSep 17, 2020

--

Nowadays, most of the population is using YouTube, Facebook, Instagram, Google, Pinterest, Yahoo, Bing etc. These companies produced thousands of Terabytes of data everyday. So, have you ever thought that where they store these large size of data? In HD?

Lets first understand the term Bigdata

The data generated on social media like on Facebook, Instagram, YouTube, Twitter, etc. or data generated through transactions are quite large in size i.e. in Terabyte or Petabyte. So these large chunks of data is known as Bigdata. So, now think about the situation when your HD gonna full. If I am not wrong your system starts working very slowly. Same with Bigdata produced by big companies everyday. So here we came to know that bigdata is a problem because of volume that how they will store such large data? and if somehow they store it then the process of accessing the data by the user will be slow down and user will switch on some other app.

Solution for Bigdata

Distributed storage is the solution for Bigdata. Now, Distributed storage means you are storing your data at different places which we normally call as servers. It means if we talk about any XYZ company which has 100 TB of data. Then that company will distribute these 100 TB to 10 different servers which will contain 10 TB each. Now, the user will not feel any issue and work will go smoothly.

Distributed System

Now the question arise do these companies need to buy servers or setting up a warehouse to store these data? Some of the companies are setting up their own servers like Facebook and some are just using this servers provided by some other companies by paying them. Because it will gonna hectic for the company to manage both the things simultaneously and the costing of hardware, software and maintenance will also will effect the business.

For Distributed storage we have many tools in the market and few popular of them are as below:

  1. Hadoop
  2. Apache Spark
  3. Apache Storm
  4. Ceph
  5. DataTorrent RTS
  6. Disco
  7. BigQuery
  8. HPCC (High-Performance Computing Cluster)
  9. Hydra
  10. Pachyderm
  11. Pestro

The big MNC’s are creating it data center through out the world to store its data. Like Google has 12 data centers in the world. Facebook social network now has 12 data center campuses around the globe, including nine in the U.S. and three in international markets. Some companies also use the servers provide Amazon known as Amazon S3 (part of AWS). Amazon S3 is for small scale storage purpose.

What is Data Centers?

Data centers are simply centralized locations where computing and networking equipment is concentrated for the purpose of collecting, storing, processing, distributing or allowing access to large amounts of data. It is to create distributed storage. For the purpose of storing data the above tools are used.

Here also comes the concept of cloud computing. Cloud Computing is the use of hardware and software to deliver a service over a network (typically the Internet). An example of a Cloud Computing provider is Google’s Gmail. Gmail users can access files and applications hosted by Google via the internet from any device.

→ Nowadays 1.7MB of data is creating every second for every person on earth.

→ Facebook revealed some big, big stats on bigdata to a few reporters at its HQ today, including that its system processes 2.5 billion pieces of content and 500+ terabytes of data each day. It’s pulling in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half hour.

→ we as a global community use around 440,000 Terabytes of data on YouTube everyday.

→ Google gets over 3.5 billion searches daily.

→ In last of 2020, there will be around 40 trillion gigabytes of data (40 zettabytes).

So, These are the data that will be there from different MNC’s and they apply these tools to store, maintain and manipulate bigdata.

#bigdata #hadoop #bigdatamanagement #arthbylw #vimaldaga #righteducation #educationredefine #rightmentor

#worldrecordholder #ARTH #linuxworld #makingindiafutureready #righeudcation

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Anupam Kumar Thakur
Anupam Kumar Thakur

Written by Anupam Kumar Thakur

AWS | DOCKER | ANSIBLE | KUBERNETES | JENKINS | MACHINE LEARNING | DEEP LEARNING | REDHAT

No responses yet

Write a response