Projects for Web Systems
12/15/2023
Web Systems (EECS 485)
This course provided a comprehensive overview of modern web technologies across the front end, back end, and large-scale distributed systems. Through hands-on projects, I gained experience building full-stack applications and learned how to extract meaningful information from web-scale data. I developed and deployed complex systems, including a photo-sharing social media platform and a search engine, using industry-standard tools and frameworks.
Key Skills & Experience
- Frontend Development: HTML, CSS, JavaScript, React (client-side dynamic pages), asynchronous programming
- Backend Development: Python, Flask (server-side dynamic pages), REST API design, session management, web security
- Cloud & Infrastructure: Amazon Web Services (IaaS, PaaS), deployment automation, Linux system administration, shell scripting
- Databases & Storage: SQL, distributed storage systems, performance trade-offs
- Parallel & Distributed Systems: Sockets, threads, multiprocessing, distributed compute frameworks
- Web Semantics & Data Processing: Text and link analysis, search engine architecture, recommender systems, ads & auctions
- Tooling & Collaboration: Git (version control), debugging, reading and applying developer documentation
This course strengthened my ability to independently learn and apply new web technologies by leveraging official documentation. I now feel confident in designing, building, and deploying scalable, full-stack web applications, as well as processing large-scale web data to extract insights and power search and recommendation features.
Projects
- Project 1: Static-Site Generator
- Projects 2 & 3: Instagram Clone
- Project 4: MapReduce
- Project 5: Search Engine
Project 1: Static Site Generator
In this project, I developed a non-interactive clone of Instagram by designing HTML templates using Jinja2 and static data rendering. The goal was to simulate a user-facing social media experience by designing multiple static pages that emulate the functionality and appearance of a photo-sharing platform like Instagram. The project emphasized template structure, content organization, and consistent styling across a multi-page site. Each page was generated using pre-defined JSON configurations with insta485generator
, focusing on correct pluralization, user relationships, and content linking across the platform.
Key Contributions
- Developed HTML Templates for six core URL routes:
/
– Post feed showing all followed users' posts/users/<username>/
– User profile with post thumbnails and stats/users/<username>/followers/
– List of followers with relationship context/users/<username>/following/
– List of followed users/posts/<postid>/
– Post detail page with comments and metadata/explore/
– Discover page showing users not currently followed
- Implemented Template Inheritance to ensure consistency in layout and structure across pages (e.g., navigation bar, headers, titles).
- Handled Dynamic Content Rendering using placeholders and iteration for:
- User relationships (e.g., "following", "not following", or blank for self)
- Correct English pluralization for likes, posts, followers, etc.
- Post metadata including timestamps, comments, likes, and images
- Styled the Application with a responsive and intuitive layout, using a clean visual hierarchy to replicate the Instagram user experience.
This project solidified my frontend development skills, particularly in writing clean and reusable HTML templates for static content generation. It also gave me valuable experience in organizing dynamic data visually and semantically through a templating approach, laying a strong foundation for more advanced full-stack web development in later parts of the course.
Projects 2 & 3: Instagram Clone
Over a multi-phase project, I built a fully functional Instagram clone from the ground up, transitioning through static HTML templating, server-side dynamic pages, and a responsive client-side application powered by REST APIs. This project gave me end-to-end experience with full-stack web development, covering front-end rendering, backend logic, data persistence, authentication, and cloud deployment.
The final product replicated Instagram’s core features, including account creation, user authentication, uploading and deleting posts, following other users, liking and commenting on posts, and infinite scrolling. The site was deployed to AWS and supported real-time UI updates using JavaScript, without requiring full page reloads.
Key Contributions
-
Frontend Development
- Designed responsive HTML templates using inheritance and reusable components
- Rendered interactive post feeds and user profiles with JavaScript
- Implemented client-side functionality for likes, comments, and infinite scrolling
- Used
dayjs
to display human-readable timestamps and relative time
-
Backend Development
- Built RESTful API endpoints (
GET
,POST
,DELETE
) for posts, likes, comments, and user interactions using Flask - Implemented server-side dynamic rendering for authenticated pages using Jinja templates
- Developed robust session handling for login/logout, access control, and route protection
- Ensured secure operations through ownership validation and HTTP status code responses (e.g.,
403
,404
,409
)
- Built RESTful API endpoints (
-
Database Design & Integration
- Designed and normalized a relational schema with tables for users, posts, comments, likes, and following relationships
- Added cascading delete behavior and foreign key constraints for data consistency
- Integrated SQL queries to support CRUD operations across all application features
-
Interactive Features
- Enabled real-time interactions: double-click to like, instant comment submission via Enter key
- Optimistically updated the UI after creating or deleting comments/likes
- Implemented infinite scroll using API pagination and JavaScript event handling
-
Deployment & Testing
- Deployed the full-stack application to AWS with production settings
- Verified functionality across all endpoints and ensured secure user flows
- Used development tools like
httpie
and browser dev tools for API testing and debugging
This project strengthened my ability to design and build scalable, user-focused web applications from the ground up. I developed a deep understanding of frontend/backend integration, REST API architecture, database design, session management, and deploying real-world systems in the cloud. The layered, iterative approach to development gave me practical experience transitioning from basic templating to a modern, fully interactive client-server web platform.
Project 4: MapReduce
Developed a fully functioning MapReduce system from scratch using Python, supporting fault-tolerant distributed computation with multiple concurrent Worker processes and a central Manager. This project emphasized low-level systems programming concepts including TCP/UDP networking, multithreading, heartbeat-based fault detection, and dynamic task orchestration across remote processes. The system could accept new job requests (including user-defined mapper and reducer executables), manage multiple concurrent workers, handle worker registration and heartbeat monitoring, and execute full MapReduce jobs across dynamic worker pools. The design included graceful shutdown, robust failure recovery, and out-of-core data handling through Unix streaming. Deployed to AWS for distributed testing and performance validation.
Key Contributions
-
Distributed Systems Architecture
- Designed and implemented a centralized
Manager
and multipleWorker
processes communicating over TCP and UDP - Implemented full job orchestration using multithreaded task assignment and coordination for both map and reduce stages
- Built a fault-tolerant heartbeat system that marked Workers as dead after 5 missed pings and automatically reassigned their tasks
- Designed and implemented a centralized
-
Networking & Concurrency
- Used TCP for command messaging and UDP for heartbeat pings between Workers and the Manager
- Leveraged Python’s
socket
,threading
, andsubprocess
libraries to create a non-blocking, multi-threaded server capable of handling simultaneous Worker communication - Built reusable networking abstractions such as
tcp_server
,udp_server
, and message dispatching handlers
-
Task Management
- Designed custom
Job
andRemoteWorker
classes to encapsulate state, track active tasks, and enable dynamic reassignment in case of failures - Implemented round-robin input file partitioning for mappers and used consistent hashing to distribute output keys to reducers
- Ensured precise state transitions for tasks and Workers (
ready
,busy
,dead
) to enable robust coordination
- Designed custom
-
Performance Optimization
- Achieved constant memory usage (
O(1)
) by streaming executable output and piping input usingsubprocess.Popen
- Used
heapq.merge()
to efficiently merge sorted intermediate files for reducer input, enabling out-of-core sorting without excessive memory consumption
- Achieved constant memory usage (
-
System Robustness & Shutdown
- Supported safe shutdown of the entire system via TCP broadcast messages
- Implemented complete cleanup logic to remove intermediate directories and safely terminate threads
- Maintained Manager responsiveness under race conditions and non-deterministic Worker execution order
-
Extensibility & Modularity
- Organized the project into well-scoped Python packages:
manager
,worker
, andutils
, promoting reuse of logging, networking, and task coordination utilities - Enabled external job submission using structured JSON messages to specify input/output paths, mappers, reducers, and parallelism levels
- Organized the project into well-scoped Python packages:
This project deepened my knowledge of systems-level programming and gave me hands-on experience building a resilient distributed computation framework. I gained fluency in thread-safe design, concurrent programming patterns, and distributed fault-tolerance strategies, all while managing low-level network communication and process lifecycle control. The final implementation reliably executed large-scale MapReduce tasks in the cloud, even under unreliable network or node conditions.
Project 5: Search Engine
Built a scalable search engine by constructing an inverted index through a pipeline of MapReduce jobs using Python scripts compatible with the Hadoop Streaming Interface. Leveraged Michigan Hadoop (Madoop) to execute multiple MapReduce stages on a large subset of Wikipedia documents focused on Michigan and technology topics. The final inverted index output contains term-level statistics including inverse document frequency, term frequency, and document normalization factors, segmented into multiple partitioned files for distributed serving. The system supports efficient querying by integrating PageRank scores with tf-idf similarity for ranking results. An Index server REST API loads segments of the inverted index and PageRank data to serve JSON-formatted search results, while a Search server aggregates results from multiple Index servers, exposing a user-friendly web interface. The entire system was deployed on AWS for cloud scalability and performance testing.
Key Contributions
-
MapReduce Pipeline for Inverted Index Construction
- Designed and implemented a multi-stage MapReduce pipeline with up to 9 jobs, each consisting of standalone Python mappers and reducers compatible with Hadoop Streaming
- Processed raw HTML Wikipedia documents by extracting doc IDs and clean text, counting total documents, and computing term frequencies and normalization factors
- Generated an inverted index sorted by terms and doc IDs, partitioned into three segment files for distributed querying using a custom partitioner and key design (doc_id % 3)
- Maintained data integrity with careful handling of leading zeros in doc IDs and used alphabetical sorting of terms for consistency
-
RESTful Index Server
- Developed a Flask-based REST API server that loads an inverted index segment, PageRank scores, and stopwords into memory at startup for low-latency queries
- Implemented endpoints such as
/api/v1/hits/
to return ranked search results with doc IDs and relevance scores based on weighted tf-idf and PageRank combination - Supported configurable PageRank weight via query parameters to adjust influence on final ranking dynamically
-
Search Server with Result Aggregation
- Built a dynamic server-side rendered search interface that concurrently queries multiple Index servers (one per inverted index segment)
- Merged paginated results using efficient Python tools (
heapq.merge
) to present top-ranked results across distributed segments - Managed a local SQLite document metadata database populated by the
searchdb
script extracting doc details (title, summary, URL) from HTML files using BeautifulSoup for rich result display
-
Advanced Query Processing
- Handled multi-word AND queries by intersecting inverted index results for each term after applying consistent text cleaning matching the index construction
- Calculated relevance scores as a linear combination of cosine similarity of tf-idf vectors and PageRank scores to improve search quality
- Enabled flexible tuning of PageRank influence through URL parameters, allowing end-user customization of result rankings
-
Deployment
- Successfully deployed the entire search infrastructure on AWS, enabling real-world scalability and performance evaluation
This project provided comprehensive experience in large-scale data processing, distributed system design, and search engine architecture. I gained hands-on skills building robust MapReduce pipelines, RESTful microservices, and multi-threaded query aggregators, culminating in a deployable cloud-based search engine integrating classical information retrieval techniques and link analysis scores.