Hello Everyone,
Garima Kanwar This Side. These Notes are short notes for revision purpose only. Please Refer Your college study material & Reference Book for Complete Study.
For Further Notes 📄 & Videos 🎥 Subscribe BTER Polytechnic Classes 🎥 YouTube Channel & Join WhatsApp Group & Whatsapp Group.
📍YouTube Channel - https://www.youtube.com/@Polytechnicbter
📌Instagram Account - https://www.instagram.com/bterpolytechnicclasses
📍Telegram Channel - https://t.me/polytechnicbter
📍WhatsApp Channel - https://whatsapp.com/channel/0029Vb3ggEG7dmedl2pJd30p
📌WhatsApp Group - https://chat.whatsapp.com/DWpN7vYqSutBNXY9R5Q2Te
Course Code CS 3005(Same as IT 3005)Course Title Algorithms
1. Fundamentals
The fundamentals of computer science and programming cover the basics that you need to understand for designing, analyzing, and implementing algorithms and systems. These include the programming models, data abstractions, and analyzing the efficiency of algorithms.
1.1 Programming Models
A programming model is the approach or structure of a programming language that defines how a program's instructions are executed and how the computer’s resources are managed. It helps us understand how to think about the flow of operations in the program.
Key Programming Models:
Imperative Programming Model:
- This is a traditional model where you give explicit instructions to the computer to perform operations step-by-step.
- Example: C, Python, Java.
- The programmer tells the computer how to do something (step-by-step process).
Declarative Programming Model:
- In this model, you specify what you want the program to do, and the system determines how to do it.
- Example: SQL (specifying what data you want from a database without specifying how the database should retrieve it).
Functional Programming Model:
- This model is based on mathematical functions and avoids changing states and mutable data.
- Example: Haskell, Scala, Lisp.
- Functions are first-class citizens, and computations are modeled as the evaluation of functions.
Object-Oriented Programming Model:
- This is based on objects (which are instances of classes) that encapsulate both data and behavior.
- Example: Java, C++, Python.
- Objects interact with each other through methods and can have properties (attributes) and behaviors (methods).
1.2 Data Abstraction
Data abstraction is the concept of simplifying complex data structures by hiding the unnecessary details. This allows programmers to focus on high-level logic without worrying about how the data is stored or manipulated.
1.2.1 Sets
A set is a collection of distinct elements, meaning no duplicate values are allowed. The order of elements does not matter.
- Set Operations:
- Union: Combine elements from two sets, keeping distinct elements.
- Intersection: Find common elements in both sets.
- Difference: Find elements in one set but not in the other.
- Subset: Check if all elements of one set are contained in another.
Diagram of a Set:
Set A = {1, 2, 3}
Set B = {3, 4, 5}
Union (A ∪ B) = {1, 2, 3, 4, 5}
Intersection (A ∩ B) = {3}
Difference (A - B) = {1, 2}
1.2.2 Multisets
A multiset (or bag) is like a set, but it allows duplicate elements. For example, in a multiset, you can have multiple instances of the same element.
- Multiset Operations: Similar to set operations, but can handle multiple occurrences of the same element.
- Cardinality: The number of occurrences of an element in a multiset.
Diagram of a Multiset:
Multiset A = {1, 1, 2, 3}
Multiset B = {2, 2, 3}
Union (A ∪ B) = {1, 1, 2, 2, 2, 3, 3}
Intersection (A ∩ B) = {2, 3}
1.2.3 Stacks
A stack is a linear data structure that follows the Last In, First Out (LIFO) principle. This means the last element added to the stack is the first one to be removed.
- Operations:
- Push: Add an element to the top of the stack.
- Pop: Remove the top element from the stack.
- Peek: View the top element without removing it.
Diagram of a Stack:
Stack (top -> bottom):
| 5 | <- Top
| 3 |
| 2 |
| 1 | <- Bottom
1.2.4 Queues
A queue is a linear data structure that follows the First In, First Out (FIFO) principle. This means the first element added to the queue will be the first one to be removed.
- Operations:
- Enqueue: Add an element to the end of the queue.
- Dequeue: Remove the element from the front of the queue.
- Peek: View the front element without removing it.
Diagram of a Queue:
Queue (front -> back):
Front -> [1, 2, 3] <- Back
Enqueue 4: [1, 2, 3, 4]
Dequeue: [2, 3, 4] (1 is removed)
1.3 Asymptotic and Worst-Case Analysis of Algorithms
To analyze the performance of algorithms, we focus on two things: time complexity and space complexity. Asymptotic analysis helps us understand how the performance of an algorithm scales with the input size.
Big O Notation (Time Complexity)
Big O notation describes the upper bound of an algorithm’s performance. It tells us how the algorithm's execution time grows as the input size increases.
Common time complexities:
- O(1): Constant time (doesn't depend on input size).
- O(log n): Logarithmic time (e.g., binary search).
- O(n): Linear time (e.g., scanning through an array).
- O(n^2): Quadratic time (e.g., bubble sort).
- O(2^n): Exponential time (e.g., solving the traveling salesman problem).
Worst-Case Analysis
The worst-case analysis refers to measuring the time or space complexity of an algorithm in the most unfavorable scenario.
For example:
- Bubble Sort has a worst-case time complexity of O(n^2) because, in the worst case, we need to compare every pair of elements.
- QuickSort has a worst-case time complexity of O(n^2) as well, but its average-case time complexity is O(n log n).
Diagram of Big O notation:
|
O(n^2) | *
| *
| *
| *
O(n log n) | *
| *
|*
O(1) | *
----------------------------------------
Input Size (n)
Summary
- Programming Models define how we think about computing, like imperative, declarative, functional, and object-oriented.
- Data Abstraction simplifies how we deal with data types, such as sets, multisets, stacks, and queues.
- Asymptotic and Worst-Case Analysis help evaluate the efficiency of algorithms by focusing on their time and space complexities.
💥💥💥
2. Sorting
2. Sorting
Sorting is the process of arranging elements in a specific order, usually in ascending or descending order. Sorting algorithms are essential for efficient searching, comparisons, and organizing data. Understanding different sorting algorithms is crucial for algorithm design.
2.1 The Sorting Problem
The sorting problem is simply to arrange a list of elements in a particular order (ascending or descending). For example, given an unsorted array of numbers, you need to arrange them from the smallest to the largest (ascending order).
Example:
Unsorted array: [5, 2, 9, 1, 5, 6]
Sorted array (ascending order): [1, 2, 5, 5, 6, 9]
The challenge in sorting lies in finding an efficient way to do this, especially when the number of elements is large. There are multiple algorithms for sorting, and their efficiency varies based on the number of elements.
2.2 Bubble Sort
Bubble Sort is one of the simplest sorting algorithms. It works by repeatedly stepping through the list, comparing adjacent items, and swapping them if they are in the wrong order. This process is repeated until no swaps are needed.
Steps:
- Compare the first two elements. If the first is greater than the second, swap them.
- Move to the next pair of adjacent elements and repeat the comparison and swap if needed.
- Continue this until the end of the list is reached.
- Repeat the process for the entire list until no swaps are made in a pass.
Time Complexity:
- Worst-case: O(n²) (when the list is in reverse order).
- Best-case: O(n) (when the list is already sorted).
Example:
Unsorted: [5, 2, 9, 1, 5, 6]
Pass 1: [2, 5, 1, 5, 6, 9]
Pass 2: [2, 1, 5, 5, 6, 9]
Pass 3: [1, 2, 5, 5, 6, 9] (sorted)
Diagram (after Pass 1):
Before: [5, 2, 9, 1, 5, 6]
Step 1: [2, 5, 9, 1, 5, 6] (swap 5 and 2)
Step 2: [2, 5, 1, 9, 5, 6] (swap 9 and 1)
Step 3: [2, 5, 1, 5, 9, 6] (swap 9 and 5)
Step 4: [2, 5, 1, 5, 6, 9] (swap 9 and 6)
2.3 Selection Sort
Selection Sort works by repeatedly finding the smallest (or largest) element from the unsorted portion of the list and swapping it with the first unsorted element.
Steps:
- Find the smallest element in the list.
- Swap it with the element at the first position.
- Find the smallest element in the remaining unsorted list and swap it with the second position.
- Repeat this until the entire list is sorted.
Time Complexity:
- Worst-case: O(n²)
- Best-case: O(n²) (doesn't change with the sorted state of the array)
Example:
Unsorted: [5, 2, 9, 1, 5, 6]
Step 1: [1, 2, 9, 5, 5, 6] (swap 5 and 1)
Step 2: [1, 2, 9, 5, 5, 6] (no change)
Step 3: [1, 2, 5, 9, 5, 6] (swap 9 and 5)
Step 4: [1, 2, 5, 5, 9, 6] (swap 9 and 5)
Step 5: [1, 2, 5, 5, 6, 9] (swap 9 and 6)
2.4 Insertion Sort
Insertion Sort works by building a sorted portion of the list one element at a time. It takes each new element and places it in the correct position relative to the already sorted portion of the list.
Steps:
- Start with the second element (the first element is already sorted).
- Compare it with the element before it, and move it backward until it is in the correct position.
- Repeat this process for all the elements in the list.
Time Complexity:
- Worst-case: O(n²) (when the list is in reverse order).
- Best-case: O(n) (when the list is already sorted).
Example:
Unsorted: [5, 2, 9, 1, 5, 6]
Step 1: [2, 5, 9, 1, 5, 6] (insert 2)
Step 2: [2, 5, 9, 1, 5, 6] (insert 9)
Step 3: [1, 2, 5, 9, 5, 6] (insert 1)
Step 4: [1, 2, 5, 5, 9, 6] (insert 5)
Step 5: [1, 2, 5, 5, 6, 9] (insert 6)
2.5 Merge Sort
Merge Sort is a divide-and-conquer algorithm. It works by splitting the list into two halves, sorting each half, and then merging the sorted halves.
Steps:
- Split the list into two halves.
- Recursively split each half until each sublist has one element.
- Merge the sublists back together in sorted order.
Time Complexity:
- Worst-case: O(n log n)
- Best-case: O(n log n)
Example:
Unsorted: [5, 2, 9, 1, 5, 6]
Step 1: Split into [5, 2, 9] and [1, 5, 6]
Step 2: Split further until we get: [5], [2], [9], [1], [5], [6]
Step 3: Merge: [2, 5, 9], [1, 5, 6]
Step 4: Final Merge: [1, 2, 5, 5, 6, 9]
Diagram of merge sort:
[5, 2, 9, 1, 5, 6]
→ [5, 2, 9] [1, 5, 6]
→ [5] [2] [9] [1] [5] [6]
→ [2, 5, 9] [1, 5, 6]
→ [1, 2, 5, 5, 6, 9] (final sorted array)
2.6 Quicksort
QuickSort is another divide-and-conquer algorithm. It works by selecting a pivot element from the list and partitioning the other elements into two sub-arrays, according to whether they are less than or greater than the pivot. This is done recursively.
Steps:
- Choose a pivot element.
- Partition the list into two sub-lists: one with elements smaller than the pivot and one with elements greater than the pivot.
- Recursively apply the same logic to the sublists.
Time Complexity:
- Worst-case: O(n²) (when the pivot is the smallest or largest element).
- Best-case: O(n log n) (when the pivot divides the list evenly).
Example:
Unsorted: [5, 2, 9, 1, 5, 6]
Step 1: Choose pivot (say 5).
Step 2: Partition into [2, 1] and [9, 5, 6].
Step 3: Recursively sort each partition.
Step 4: Final sorted list: [1, 2, 5, 5, 6, 9]
Diagram of quicksort:
[5, 2, 9, 1, 5, 6]
→ Pivot = 5
→ Partition into [2, 1] and [9, 5, 6]
→ Recursively sort the partitions
→ Final sorted array: [1, 2, 5, 5, 6, 9]
Summary of Sorting Algorithms
Algorithm | Worst-case Time Complexity | Best-case Time Complexity | Space Complexity |
---|---|---|---|
Bubble Sort | O(n²) | O(n) | O(1) |
Selection Sort | O(n²) | O(n²) | O(1) |
Insertion Sort | O(n²) | O(n) | O(1) |
Merge Sort | O(n log n) | O(n log n) | O(n) |
QuickSort | O(n²) | O(n log n) | O(log n) |
Each algorithm has its advantages and disadvantages depending on the size of the dataset and the context in which it's used.
💥💥💥
3. Searching
3. Searching
In computer science, searching refers to finding a specific element within a data structure. Efficient searching is critical, especially with large datasets. Several data structures have been developed to perform searching operations quickly, and understanding these structures will help you solve searching problems more effectively.
3.1 Symbol Tables
A symbol table is a data structure that stores associations between keys and values. It is used to manage and look up symbols, typically in contexts such as compilers or dictionaries. The key can be any value used to access a corresponding value, which can be any data type.
Operations:
- Insert (key, value): Insert a key-value pair into the table.
- Search (key): Search for a value using the key.
- Delete (key): Remove a key-value pair.
- Update (key, new_value): Update the value associated with a given key.
Example: A symbol table could be used to store employee IDs (keys) and employee names (values).
Key | Value |
---|---|
"E001" | "John" |
"E002" | "Jane" |
"E003" | "Jim" |
A typical way to implement a symbol table is by using hash tables or binary search trees.
3.2 Binary Search Trees (BST)
A Binary Search Tree (BST) is a tree data structure where each node has at most two children, and the left child is smaller than the parent node, while the right child is larger.
Properties of BST:
- For each node:
- All values in the left subtree are smaller than the node’s value.
- All values in the right subtree are larger than the node’s value.
- Binary Search: This structure allows for fast search, insertion, and deletion operations because you can discard half of the search space at each step.
Operations:
- Search: Start from the root, and depending on whether the key is smaller or greater than the current node, move left or right recursively.
- Insertion: Similar to search, but when an empty spot is found, insert the new key-value pair.
- Deletion: To delete a node, you handle three cases:
- Node with no children (leaf node): Simply remove it.
- Node with one child: Replace the node with its child.
- Node with two children: Find the in-order successor (smallest value in the right subtree) or in-order predecessor (largest value in the left subtree) and replace the node with it.
Time Complexity:
- Search/Insert/Delete: O(log n) on average, but O(n) in the worst case if the tree is unbalanced (degenerates into a linked list).
Diagram of a Binary Search Tree:
50 / \ 30 70 / \ / \ 20 40 60 80
3.3 Balanced Search Trees
Balanced Search Trees are an enhancement of Binary Search Trees (BSTs). In a balanced tree, the height is kept small to ensure efficient operations. When a BST becomes unbalanced, its performance degrades, becoming similar to a linked list. A balanced search tree avoids this by maintaining balance after insertions and deletions.
Types of Balanced Search Trees:
AVL Tree (Adelson-Velsky and Landis Tree):
- An AVL tree is a self-balancing binary search tree where the difference in heights of the left and right subtrees (called the balance factor) of any node is at most 1.
- Rotations (left and right rotations) are used to maintain balance after insertion or deletion.
Red-Black Tree:
- A red-black tree is a balanced binary search tree with additional properties:
- Each node is either red or black.
- The root is always black.
- Red nodes cannot have red children (no two red nodes in a row).
- Every path from a node to its descendant leaves must have the same number of black nodes.
- The balancing here is less strict than in AVL trees, but it ensures that operations stay efficient.
- A red-black tree is a balanced binary search tree with additional properties:
Operations in Balanced Trees:
- Search/Insert/Delete: Balanced trees guarantee O(log n) time complexity for all these operations, ensuring efficient searching even with large datasets.
Diagram of a Red-Black Tree:
50(B)
/ \
30(R) 70(R)
/ \ / \
20(B) 40(B) 60(B) 80(B)
- (B): Black node
- (R): Red node
3.4 Hash Tables
A Hash Table is a data structure that stores data in an array format using a hash function to compute an index (or "hash code") into the array, where the corresponding value is stored. This allows for constant time complexity on average for search, insert, and delete operations.
Hash Function:
A hash function takes a key (such as a string or integer) and computes an index in the array where the value is stored. A good hash function should distribute keys evenly across the array to avoid collisions (where different keys map to the same index).
Collisions:
When two different keys hash to the same index, it’s called a collision. There are two main ways to handle collisions:
Chaining: Each array index stores a linked list (or other structure) of values that hash to that index. When a collision occurs, the new element is added to the list.
- Example: For a hash index 3, if both "apple" and "banana" hash to 3, store both in the list at index 3.
Open Addressing: When a collision occurs, the algorithm searches for the next available slot in the array (using methods like linear probing, quadratic probing, or double hashing).
Operations:
- Search: Apply the hash function to the key to find the corresponding index and search for the value.
- Insert: Apply the hash function to the key and insert the value at the corresponding index, handling collisions if needed.
- Delete: Apply the hash function to find the key and remove it.
Time Complexity:
- Average Case: O(1) for search, insert, and delete (with a good hash function).
- Worst Case: O(n) if there are many collisions, but this is rare.
Diagram of Hash Table with Chaining:
Index 0 -> [apple, orange]
Index 1 -> [banana]
Index 2 -> []
Index 3 -> [grape]
Index 4 -> [pear, peach]
Summary
- Symbol Tables store key-value pairs, typically implemented using hash tables or binary search trees.
- Binary Search Trees (BST) are trees where each node has at most two children and the left subtree has smaller values than the right.
- Balanced Search Trees (like AVL and Red-Black Trees) keep the tree balanced to ensure efficient operations, maintaining O(log n) time complexity.
- Hash Tables use hash functions to store values in an array, allowing for fast access and manipulation of data, with O(1) average-case time complexity for operations.
💥💥💥
4. Graphs
4. Graphs
A graph is a collection of vertices (also called nodes) and edges (also called arcs or links) that connect the vertices. Graphs are used to model a wide range of problems, such as networks, social connections, and even scheduling tasks.
4.1 Definition of a Directed and Undirected Graph
4.1.1 Directed Graph (Digraph)
In a directed graph (or digraph), edges have a direction. That means the edges go from one vertex to another, and the relationship is one-way.
- Vertices: The nodes of the graph.
- Edges: The directed connections from one vertex to another (represented by arrows).
Example: A directed graph could represent a Twitter follower relationship, where edges represent the direction of following.
A → B
↑ ↓
C ← D
In this graph:
- A follows B (A → B)
- B follows D (B → D)
- D follows C (D → C)
- C follows A (C → A)
4.1.2 Undirected Graph
In an undirected graph, edges have no direction. The relationship between vertices is two-way, meaning that if there’s an edge between vertex A and vertex B, both A and B are connected, and the connection can be traversed in either direction.
- Vertices: The nodes of the graph.
- Edges: The undirected connections between nodes.
Example: An undirected graph could represent a friendship in a social network, where edges show mutual relationships.
A -- B
| |
C -- D
In this graph:
- A is friends with B (A ↔ B)
- A is friends with C (A ↔ C)
- B is friends with D (B ↔ D)
- C is friends with D (C ↔ D)
4.1.3 Paths
A path in a graph is a sequence of vertices where each vertex is connected by an edge. The path starts from one vertex and ends at another.
- Simple Path: A path where no vertex is repeated.
- Path Length: The number of edges in the path.
Example: In a directed graph:
A → B → C → D
The path from A to D is A → B → C → D.
4.1.4 Cycles
A cycle is a path that starts and ends at the same vertex, with no other vertex repeated along the way. A cycle indicates that a graph contains a loop.
- Cycle in Directed Graph: The direction of edges forms a closed loop.
- Cycle in Undirected Graph: A loop with no direction.
Example (Directed Graph):
A → B → C → D → A
This forms a cycle because we can start at A and return to A through the directed edges.
Example (Undirected Graph):
A -- B
| |
D -- C
This is a cycle because you can go from A → B → C → D → A, and the edges are undirected.
4.1.5 Spanning Trees
A spanning tree of a graph is a subgraph that includes all the vertices of the original graph and is a tree (a connected, acyclic graph). In a spanning tree, there are no cycles, and the number of edges is always n-1, where n is the number of vertices.
- Spanning Tree: A subset of edges that connects all vertices.
- A graph may have many different spanning trees.
Example: For the undirected graph:
A -- B
| |
C -- D
One possible spanning tree is:
A -- B
|
C -- D
4.2 Directed Acyclic Graphs (DAGs)
A Directed Acyclic Graph (DAG) is a directed graph with no cycles. This means there’s no way to start at one vertex and follow the directed edges back to the same vertex.
DAGs are particularly useful in scenarios where ordering matters, such as scheduling tasks, dependencies, or workflows.
Examples:
- Task Scheduling: Each task depends on the completion of some other task.
- Version Control Systems: A commit history graph is typically a DAG.
Properties of a DAG:
- No cycles.
- Can be topologically sorted (explained below).
4.3 Topological Sorting
Topological Sorting is a linear ordering of the vertices in a DAG such that for every directed edge u→v, vertex u comes before vertex v in the ordering.
- Application: Useful in scheduling tasks where some tasks must be done before others (e.g., course prerequisites).
- Uniqueness: There may be multiple valid topological sorts for a DAG.
Steps for Topological Sorting:
- Identify vertices with no incoming edges (in-degree = 0).
- Add those vertices to the result.
- Remove the selected vertex and its outgoing edges from the graph.
- Repeat until all vertices are processed.
Example: For a DAG representing task dependencies:
A → B → C
↓ ↑
D ← E
One topological sort could be: A → D → B → E → C.
4.4 Minimum Spanning Tree (MST) Algorithms
A Minimum Spanning Tree (MST) is a subset of edges from an undirected, connected graph that connects all the vertices without forming any cycles and with the minimum possible total edge weight.
Algorithms for MST:
- Prim’s Algorithm
- Kruskal’s Algorithm
4.4.1 Prim’s Algorithm
Prim’s algorithm starts with an arbitrary vertex and grows the MST by adding the smallest edge that connects a vertex in the MST to a vertex outside the MST.
Steps:
- Start with any vertex and mark it as part of the MST.
- Find the smallest edge that connects a vertex in the MST to one outside the MST.
- Add that edge and the new vertex to the MST.
- Repeat until all vertices are included.
4.4.2 Kruskal’s Algorithm
Kruskal’s algorithm works by sorting all the edges by weight and then adding edges one by one, ensuring that adding the edge doesn’t form a cycle.
Steps:
- Sort all the edges in non-decreasing order of their weight.
- Add edges one by one to the MST, ensuring that adding the edge doesn’t form a cycle (usually by using a union-find data structure).
- Stop when the MST contains n−1 edges.
4.5 Shortest Path Algorithms: Dijkstra’s Algorithm
Dijkstra’s algorithm is used to find the shortest path from a source vertex to all other vertices in a weighted graph (positive weights only).
Steps:
- Start at the source vertex and set the distance to 0.
- Mark all other vertices with a distance of infinity.
- Visit each vertex and update the distances to its neighbors if a shorter path is found.
- Once all vertices are processed, the shortest path from the source to each vertex is known.
Time Complexity:
- O(V²) for simple implementations.
- O(E + V log V) with a priority queue (using a heap).
4.6 Flow-based Algorithms
Flow algorithms are used to solve problems related to network flows, where the goal is to send as much flow as possible from a source to a sink node through a network of nodes and edges. The most common problem is the Maximum Flow Problem.
1. Ford-Fulkerson Algorithm
This algorithm computes the maximum flow in a flow network. It repeatedly finds augmenting paths and increases the flow along these paths until no more augmenting paths exist.
2. Edmonds-Karp Algorithm
An implementation of the Ford-Fulkerson algorithm using Breadth-First Search (BFS) to find the augmenting path. It guarantees an O(VE²) time complexity.
Summary of Key Graph Concepts:
Concept | Description |
---|---|
Directed Graph (Digraph) | Edges have direction; relationships are one-way. |
Undirected Graph | Edges have no direction; relationships are two-way. |
Path | A sequence of vertices connected by edges. |
Cycle | A path that starts and ends at the same vertex. |
Spanning Tree | A tree that connects all vertices in a graph. |
DAG | A directed acyclic graph, used for tasks with dependencies. |
Topological Sort | A linear ordering of vertices in a DAG. |
Minimum Spanning Tree | A tree connecting all vertices with minimum edge weights. |
Dijkstra's Algorithm | Finds the shortest path from a source vertex in a weighted graph. |
Flow-based Algorithms | Used to solve problems related to network flow (e.g., Ford-Fulkerson). |
💥💥💥
5. Strings
5. Strings
A string is a sequence of characters, often used to represent text. String operations and algorithms are essential in various applications like searching, sorting, pattern matching, text manipulation, and compression.
5.1 String Sort
String sorting refers to the process of sorting an array or a list of strings based on lexicographical order (dictionary order). String sorting is important for tasks like searching, indexing, and organizing datasets.
Approaches for String Sorting:
Lexicographical Order: Sorting based on the dictionary order of characters (e.g., "apple", "banana", "cherry").
Sorting Algorithms for Strings:
- Quick Sort: Uses a divide-and-conquer approach. It can be adapted to work on strings by comparing characters lexicographically.
- Merge Sort: A stable sorting algorithm, which divides the list and merges the sorted sublists. Merge sort is often used for string sorting in external sorting scenarios.
- Radix Sort: Particularly efficient for fixed-length strings (or integers treated as strings). It sorts strings character by character, starting from the least significant character.
Example of Lexicographical Sorting:
Given the list of strings: ["banana", "apple", "cherry", "date"]
Sorted order: ["apple", "banana", "cherry", "date"]
5.2 Tries
A Trie (also called a prefix tree) is a tree-like data structure used to efficiently store a set of strings, especially when we need to search, insert, or retrieve strings based on their prefixes.
Structure of a Trie:
- Each node in the Trie represents a character in the string.
- The root node represents the empty string.
- Each edge represents a character, and traversing from the root to a node forms a string.
- The leaf nodes represent the end of a string.
Operations in a Trie:
- Insert: To insert a string, we start from the root and follow the path for each character in the string. If a character does not exist, a new node is created.
- Search: To search for a string, we follow the path formed by the characters of the string. If we reach a node corresponding to the last character and it marks the end of the string, we have found the string.
- Prefix Search: A Trie can also be used to check if any string starts with a particular prefix, which is useful in autocomplete systems.
Time Complexity:
- Insertion/Searching: O(m), where m is the length of the string.
- Space Complexity: The space complexity depends on the number of strings and their lengths.
Example of a Trie: For the strings ["apple", "app", "banana"], the Trie would look like:
root
/ \
a b
/ \ \
p p a
/ \ \ \
p l e n
/ \ / \
e a a a
- "apple" and "app" share the same prefix "app".
- "banana" starts with a different character, so it branches off after "b".
5.3 Substring Search
Substring search refers to finding a pattern (substring) within a text (string). It’s a crucial operation in many applications like text editing, DNA sequence matching, and web searches.
Common Algorithms for Substring Search:
Naive Approach:
- Simply check each possible starting position in the text and see if the substring matches.
- Time Complexity: O(n * m), where n is the length of the text and m is the length of the pattern.
Knuth-Morris-Pratt (KMP) Algorithm:
- The KMP algorithm improves the naive approach by avoiding re-examination of already matched characters.
- Time Complexity: O(n + m).
Rabin-Karp Algorithm:
- Uses a hashing technique to search for patterns. It computes the hash of the pattern and the hashes of substrings in the text to efficiently find matches.
- Time Complexity: O(n + m) on average.
Boyer-Moore Algorithm:
- The Boyer-Moore algorithm works by skipping over sections of the text that are unlikely to match the pattern, based on pre-processing information about the pattern.
- Time Complexity: O(n/m) on average, making it very efficient for long texts and patterns.
Example:
Given the text: "abracadabra"
and the pattern "abra"
, a substring search will return the positions where "abra"
appears.
5.4 Regular Expressions (Regex)
A regular expression (regex) is a powerful tool used for pattern matching and searching within strings. It allows you to define complex search patterns to find, replace, or extract specific information.
Basic Components of Regular Expressions:
- Literal characters: Match exact characters (e.g.,
a
matches "a"). - Special characters:
.
: Matches any character except a newline.*
: Matches zero or more of the preceding element.+
: Matches one or more of the preceding element.?
: Matches zero or one of the preceding element.[]
: Matches any character inside the square brackets (e.g.,[abc]
matches "a", "b", or "c").|
: Acts as an OR operator (e.g.,a|b
matches "a" or "b").^
: Matches the start of the string.$
: Matches the end of the string.
Example Regex Patterns:
^a.*b$
: Matches any string that starts with "a" and ends with "b".\d{3}-\d{2}-\d{4}
: Matches a social security number in the format "123-45-6789".\w+
: Matches one or more word characters (letters, digits, or underscores).
Applications of Regular Expressions:
- Input validation: Ensure a phone number, email, or date follows the correct format.
- Search and Replace: Efficiently find and replace text in documents or strings.
- Data Extraction: Extract specific patterns from text, such as extracting phone numbers from a body of text.
5.5 Elementary Data Compression
Data compression refers to the process of reducing the size of data to save storage space or transmission time. Elementary data compression algorithms are the simplest forms of compression and focus on reducing redundancy in the data.
Types of Compression:
Run-Length Encoding (RLE):
- RLE compresses data by replacing sequences of identical characters with a single character and a count. It works well on data with long sequences of repeated characters.
Example:
Original:"AAAABBBCCDAA"
Compressed:"4A3B2C1D2A"
Huffman Coding:
- Huffman coding is a lossless data compression algorithm that assigns shorter codes to more frequent characters and longer codes to less frequent characters.
- It builds a binary tree based on the frequencies of characters and assigns a binary code to each character.
Example: For the string
"abacabad"
, the frequency of characters is:- a: 4
- b: 3
- c: 1
- d: 1
The tree and codes would be built based on frequency, resulting in an efficient binary encoding.
Dictionary-Based Compression (e.g., LZ77, LZ78):
- These algorithms replace repeating substrings with references to a dictionary of previously seen substrings.
- LZ77 stores pairs of (position, length) to represent repeated substrings.
Summary of Key Topics in Strings:
Topic | Description |
---|---|
String Sort | Sorting strings in lexicographical order using algorithms like quick sort, merge sort, or radix sort. |
Tries | Tree-like data structure used for storing strings, supporting efficient insertion, search, and prefix queries. |
Substring Search | Finding a pattern (substring) within a text using algorithms like KMP, Rabin-Karp, and Boyer-Moore. |
Regular Expressions (Regex) | Pattern matching tool for text search, extraction, and validation using special symbols. |
Elementary Data Compression | Techniques like Run-Length Encoding (RLE) and Huffman Coding to reduce the size of data. |
0 Comments