Data & Data Flow | System Design Tutorials | Part 5 | 2020

 Part 5 of Yogita Sharma System Design Tutorial: Data & Data Flow

Items for review in this video are:

  • Data
  • Data Format / Representation
  • Mechanisms for Data Flow
  • Factors: Type, Volume, Scale, Purpose

The example is given that in buildings such as hospitals, schools, theatre, bank, or hotels - people interact in those places. Here the people represent the data and the buildings are the systems with which the data is associated.

Data is at the Core of System Design

  • Business Layer      ->   Text, videos, images
  • Application Layer ->   JSON/XML
  • Data Stores (DBs) ->   Tables, indexes, lists, trees
  • Network layer       ->   Packets
  • Hardware Layer    ->   Zeros and Ones


Network Layers (source)


As data moved between the layers, it is transformed to allow transmission, receiving, or storage of data. When designing systems, the three layers that always need to be considered are: business, application, and data layer. Data needs to flow in efficient manner and data needs to be stored in secure and  effective manner.

Understanding what the data is that the system is handling, how it will flow through the system, and how it is stored is half the work of system design. 

Data stores, and example data, include:

  • Database
    • Username, phone number, city, address
  • Queues
    • Send SMS Request, Send Email Request
  • Cache
    • Request : Response, images
  • Indexes
    • Most searched items, Searches from past 1 hour
Data flows include:
  • APIs
  • Messages
  • Events
Data can be generated from a number of different sources, including for example:
  • Users
  • Internal
  • Insights
  • Sensor input
Internal data includes data about other data. For example, if a user creates a profile for a game. Then how many hours a person spends on each game would form part of the data about data - information about the user who is represented in their gaming profile.

Factors relating to Data:

When designing the system - not only do data flow and data storage play a critical role - so too does the type of data.

  • Types of Data
  • Volume
  • Consumption / Retrieval
  • Sensitivity

An example of differences in how data is handled is comparing text processing with video processing. The former may be parsed, analysed, transformed, and transmitted without breaking up the units of texts. Video may be sliced in to segments, transformed and processed in parallel, and then put back in the correct order - as the size of the data compared to text is much larger.

Systems consuming gigabytes versus those consuming terabytes vary significantly. Some systems may consume more than they produce and vice versa. Volume and whether to read and/or write - are major factors for deciding how to design the system.

Below are some example systems and their data flows:

  • Authorisation system
    • User login, identity management
    • Volume is low
    • Security and priority levels are high
  • Streaming service
    • Netflix, BBC iPlayer, Prime Video
    • Volume is high
    • Data retrieval is also high 
  • Transactional Systems
    • E-commerce, Ride-share app, grocery ordering app
    • Validation is high priority
    • Paths of the data flow are very important to ensure consistency
  • Heavy compute systems
    • Image recognition systems, video processing using machine learning models
    • These systems have little retrieval, however they do have a lot of upload and computation

Next we'll look at how best to store data and their use cases.

No comments:

Post a Comment