Are Data Structures important for Data science professionals? An overview along with real-world applications.
Introduction:
You might be already aware that algorithms and data structures are considered core skills for Software Developers. Is it helpful for Data Science field?
Data Science fields such as Data Analysts, Data Engineers, and Data Scientists work with large amounts of data on day to day basis. Data extraction, transformation, and load(ETL) become a part of our day-to-day activities. How can data structures help us in such scenarios?
Data structures:
The way we write our code has a direct impact on the utilization of resources, especially considering that in today’s world most things are deployed on cloud-based services such as AWS. Not only writing our code efficiently impacts the cost we have to bear using these services but also directly corresponds to the time our program takes to execute. In terms of data structures, this can be represented as a Big O notation. It is a method to express the time complexity of an algorithm. With time complexity there also comes space complexity which is the amount of space utilization that is done by your piece of code.
From a business perspective keeping the cost low directly co-relates to maximizing the profit and thereby it is important to use the right data structures wherever possible.
In the next section, I am going to walk you through some terminologies in the data structure, and in the coming articles, we can have a deep dive into it with some examples.
Complexity Analysis:
With the ever-increasing size of data produced daily, it is essential to carry out proper processing of data which can be analyzed by knowing the computations going under the hood.
There are three types of analysis that can be carried out.
Best Case:-
This is the case when your program takes the least/less time or space
Average Case:-
This average casehelps us to understand the performance which lies between the best and the worst case.
Worst Case:-
In this case, we analyze the algorithm for the input for which it performs the worst
The following notations help us to analyze the above cases:-
There are mainly three types of notations based on which we can judge the complexity of a code.
1. Theta Notation (θ):-
Used to specify the average-case complexity
2. Omega Notation (Ω):-
Used to specify the best-case complexity
3. Big-O Notation (Ο):–
Used to specify the worst-case complexity. We are concerned about the worst-case so as to know how bad could the worst case be.
Different Big O Run Times:-
Here we describe the most common run times of the algorithms.
O(n):- Linear time complexity
O(log n):- Logarithmic time complexity
O(n²):- Denotes an algorithm whose growth doubles with the addition of single input data
An overview of some of the Data Structures and Algorithms:-
- Arrays:
Arrays and Lists are used to store multiple elements at a time.
Practical Applications:
- Online ticketing system.
2. Linked Lists:
In a linked list the elements are connected to each other using links, which are practically the memory address of each element.
Practical Applications:
- In the music player songs are linked to the next and the previous song.
- Multiplayer game
3. Stacks:
A stack is a data structure that follows Last In First Out(LIFO) to store the elements i.e. the elements which are entered last are the first ones to be popped out.
Practical Applications:
- Undo / Redo operations in word.
- Java Virtual Machine(JVM) uses a stack. Syntaxes are parsed using a stack.
4. Queues:
A Queue is a data structure that follows First In First Out(FIFO) to store the elements. For e.g. A movie line queue in which the person who comes first is the first one to be served.
Practical Applications:
- CPU task scheduling
- Single shared resources such as printers use Queue to serve the requests.
- The Customer Service Representative System uses a queue for serving the requests sequentially.
5. Trees
Trees are a form of a non-linear data structure stored in a hierarchical manner.
Practical Applications:
- Decision Trees uses a tree data structure
- Parsers
6. Maps and Hashing:
These data structures follow indexing. Basically, a hash function computes the index to store it in a bucket using the key value.
Practical Applications:
- Implement a dictionary
- Database Indexing
- Compiler Operations
- Cryptography
7. Tries
A Trie data structure is similar to a tree data structure. Each node stores some value such as a unique letter.
Practical Applications:
- Autocorrect
- Email Autocomplete
- Browser history autocomplete
Sorting Algorithms:-
1. Bubble Sort:
The bubble sort works by comparison of adjacent elements and swapping them until the final results are sorted in the correct order.
2. Merge Sort
The merge sort divides the input into sub-parts that are half in size. Each subpart is processed individually and finally merged again to get the final results. The time complexity of the merge sort is better than the bubble sort although the space complexity is worse than the bubble sort.
3. Quick Sort
The quick sort algorithm works primarily using a pivot. A pivot is picked and the data is then partitioned around that pivot element. We keep on picking up a pivot until we get the final sorted results.
4. Heap Sort
This is an in-place sorting algorithm. It treats an array as a binary tree and moves the largest element to the end until we get the sorted result.
Advanced Algorithms:-
Binary Search
Binary search helps us to find an element in a sorted list. It keeps on dividing the input into halves until we find the final results. The time complexity of this search is O(log n).
2. Greedy Algorithms
Greedy Algorithms work on an approach by considering the best results in the current stage. It builds up the results piece by piece always considering the best one at the current stage. One disadvantage of this is, that although, the best results are picked one by one the overall solution might not be optimal.
3. Divide and Conquer Algorithms:
In Divide and Conquer algorithms, we focus on dividing the problem into sub-problems i.e. parts of instances of a given problem. An example of divide and conquer is binary search and merge sort algorithms. Subproblems are independent of each other.
4. Dynamic Programming:
Dynamic Programming is an extension of Divide and Conquer where we can use methods such as memorization. Subproblems are interdependent in this case unlike Divide and Conquer.
5. Graph Algorithms:
A graph consists of Nodes and Edges. For a new relation between two nodes, they are connected by edges. A real-world example of this is Meta(Facebook) where each user can be considered a node. If you become new friends with someone a new node relation is created between you and the other user.
6. A*
A* algorithm helps us to find the shortest distance between the current node and destination. Unlike, greedy algorithms this doesn’t consider the best solution at the current step but looks for an overall optimal solution.
In this article, we discussed an overview of some basic data structures with real-life examples. I hope you find the article helpful. If you have any questions feel free to ask them in the comment section below. Hope you have a great day ahead :)