Capstone: Data Structures Portfolio

Lesson 12 · OKSTEM College · AS Computer Science

Capstone Overview

Implement a Text Editor Buffer using the data structures from this course. A production text editor (VSCode, Vim) uses specialised buffers — here you’ll build a simplified version with multiple approaches and compare their trade-offs.

Part A — Implement Three Buffers

Each buffer supports: insert(pos, text), delete(pos, length), read(pos, length), __len__.

Array buffer — store the whole document as one Python list of characters. Simple but O(n) insert/delete.
Linked list buffer — each node is a character. O(1) insert/delete given a pointer; O(n) access by position.
Gap buffer — an array with a “gap” at the cursor position. Insert into the gap: O(1). Move cursor: O(distance). Used by Emacs.

Gap buffer hint: maintain gap_start and gap_end indices into a fixed array. Characters before gap_start and after gap_end are the document; the gap is empty space for insertion.

Part B — Undo Stack

Add an undo/redo system to any of your buffers using two stacks. Each mutating operation (insert, delete) pushes a reverse operation onto the undo stack. Undo pops from undo and pushes to redo.

Real editors use a more sophisticated approach called a persistent rope (a balanced BST of string chunks), but the two-stack undo model is the conceptual foundation.

Part C — Autocomplete

Given a dictionary of words, implement autocomplete using a Trie:

Load all dictionary words into the trie.
Given a prefix string, return all words in the trie with that prefix (BFS or DFS from the prefix node).
Rank suggestions by frequency (store a count at each terminal node).

Part D — Analysis

Write a comparison table: for each buffer implementation, give the time complexity of insert, delete, and read. Explain which you would choose for: (a) a document always edited at the end (log appender), (b) a document with edits scattered throughout, (c) a small embedded device with 64 KB RAM.

This portfolio demonstrates: Stack, Queue, Linked List, Trie, Hash Table, and complexity analysis — the core of any data structures course and technical interview preparation.

Knowledge Check

A gap buffer is particularly efficient for

Gap buffer is O(1) near the gap but O(n) to move the gap.

Correct — consecutive inserts at the same position don't move the gap, so they're O(1) each.

Searching still requires O(n) scan.

Undo/redo uses a separate stack mechanism, not the gap buffer itself.

Recap: gap buffer = O(1) insert/delete at gap, O(n) gap move. Optimal for editors where the user types in one place for a while.

The two-stack undo/redo pattern works by

That's snapshot-based undo — expensive for large documents.

Correct — each operation records its inverse; undo applies inverses in LIFO order.

Queue would give FIFO order; undo is LIFO (most-recent first).

That's also snapshot-based; stacks of reverse operations are lighter.

Recap: undo stack: push 'delete char X at pos P' for every insert. Undo: pop and execute. Push to redo stack. Redo: pop redo and re-execute.

For a log appender (always appending to the end) the best buffer is

Append to a linked list without a tail pointer is O(n).

Correct — Python list.append() is O(1) amortised; no shifting needed for end inserts.

Gap buffer shines for mid-document edits; for tail-only appends, a plain array is simplest.

Tries store strings character by character for prefix lookup, not general documents.

Recap: list.append() = O(1) amortised (dynamic array doubling). For a write-ahead log or any tail-append workload, Python list is optimal.

Trie autocomplete's advantage over scanning all words in a hash set is

Tries typically use more memory — one node per unique prefix.

Correct — navigate to the prefix node in O(L), then collect all words in its subtree.

Both tries and hash sets can store numbers (with appropriate representation).

Tries are considerably more complex to implement than hash sets.

Recap: trie autocomplete = O(L) to find prefix node + O(matches) to collect them. Hash set = O(n * L) to scan all words for prefix match.

Which data structure used in this course most directly solves cycle detection in Kruskal's MST?

Hash tables map keys to values; they don't track component membership efficiently.

Correct — DSU tells you in near-O(1) whether two nodes are in the same component.

Min-heap gives the cheapest edge; DSU checks if it would form a cycle.

Tries handle strings; DSU handles component membership.

Recap: Kruskal's uses both: a min-heap (or sorted edge list) to pick the cheapest edge, and DSU to check/prevent cycles. Together: O(E log E) MST.

← Previous