Website Portfolio

Published: June 10, 2025
webastro

Portfolio website to showcase my work and projects in and outside STEM.

Project Overview

This project is a lightweight BitTorrent client written in C++23, designed specifically to download single-file torrents from YTS.mx. It implements core BitTorrent protocol features, including torrent file parsing, tracker communication, peer-to-peer file downloading, and piece verification. The client supports essential commands (info, peers, download_piece, download) to interact with torrents, making it a practical tool for understanding and demonstrating BitTorrent technology.

The project was developed as part of a learning exercise to master network programming, file handling, and protocol implementation in C++. It’s optimized for YTS.mx torrents, which typically use single-file structures and HTTP/HTTPS trackers, ensuring reliability and simplicity. This documentation explains every component, from code structure to algorithms, and reflects the concepts learned during development.

Project Structure

The codebase consists of a single source file, main.cpp, organized into logical sections with clear comments, alongside configuration files for building and running the project:

  • main.cpp: The core implementation, containing all functionality for bencoding, tracker communication, peer interactions, and command-line interface.
  • CMakeLists.txt: CMake configuration for building the project with dependencies (CURL, OpenSSL, nlohmann/json).
  • your_program.sh: Shell script for local compilation and execution, using vcpkg for dependency management.
  • lib/nlohmann/json.hpp: External JSON library for parsing bencoded data.

The project uses a single-file approach for simplicity, with main.cpp divided into five commented sections:

  1. Includes and Dependencies: External libraries and standard C++ headers.
  2. Bencoding and Decoding Functions: Parsing and encoding BitTorrent’s bencode format.
  3. Tracker Utilities: Functions for communicating with HTTP/HTTPS trackers.
  4. Peer Communication: Logic for downloading pieces from peers.
  5. Main Command Logic: Command-line interface for user interaction.

Detailed Code Explanation

1. Includes and Dependencies

This section imports necessary libraries and defines the json alias for nlohmann::json.

  • Headers:

    • nlohmann/json.hpp: Parses bencoded data into JSON for easy manipulation.
    • <curl/curl.h>: Handles HTTP requests to trackers.
    • <openssl/sha.h>: Computes SHA-1 hashes for info hash and piece verification.
    • <arpa/inet.h>, <netinet/in.h>, <sys/socket.h>, <fcntl.h>, <sys/select.h>: Network programming for peer connections.
    • <fstream>, <iostream>, <sstream>: File and console I/O.
    • <random>: Generates random peer IDs.
    • <string>, <vector>: Standard C++ utilities.
    • <unistd.h>: POSIX functions for socket operations.
  • Purpose: Provides the foundation for network communication, cryptographic hashing, JSON parsing, and file handling, tailored to BitTorrent’s requirements.

2. Bencoding and Decoding Functions

BitTorrent uses bencode, a simple encoding format for strings, integers, lists, and dictionaries. This section implements parsing and encoding logic.

  • decode_bencoded_value(const std::string &encoded_value, size_t &pos):

    • Purpose: Recursively decodes a bencoded string, updating the position (pos) in the input string.
    • Logic:
      • Handles four bencode types:
        • Strings: <length>:<contents> (e.g., 4:spam"spam").
        • Integers: i<number>e (e.g., i42e42).
        • Lists: l<elements>e (e.g., li1ei2ee[1, 2]).
        • Dictionaries: d<key-value pairs>e (e.g., d3:foo3:bare{"foo": "bar"}).
      • Uses std::stoll for number parsing and throws exceptions for invalid formats.
      • Returns a nlohmann::json object representing the decoded value.
    • Key Features:
      • Robust error handling for malformed bencode (e.g., missing e, invalid lengths).
      • Recursive parsing for nested structures.
      • Position tracking to process the entire string.
  • decode_bencoded_value(const std::string &encoded_value):

    • Purpose: Wrapper function to decode an entire bencoded string.
    • Logic: Calls the recursive function starting at pos = 0 and verifies that the entire string is consumed.
    • Use Case: Parses .torrent files and tracker responses.
  • bencode(const json &value):

    • Purpose: Encodes a JSON value into bencode format.
    • Logic:
      • Converts JSON types (strings, integers, arrays, objects) to bencode.
      • Sorts dictionary keys for consistent encoding (required for info hash).
      • Uses recursion for nested structures.
    • Use Case: Generates bencoded info dictionary for SHA-1 hashing.
  • Why It Matters: Bencoding is the backbone of BitTorrent’s data format, used in .torrent files and tracker communication. These functions enable the client to read and process torrent metadata and responses.

3. Tracker Utilities

This section handles communication with BitTorrent trackers to obtain peer lists.

  • url_encode_info_hash(const unsigned char *hash, size_t length):

    • Purpose: Converts a 20-byte SHA-1 hash (e.g., info hash) into a URL-encoded string.
    • Logic: Formats each byte as a %XX hexadecimal value (e.g., byte 0x1A%1A).
    • Use Case: Required for tracker HTTP requests (e.g., ?info_hash=<encoded_hash>).
  • write_callback(void *contents, size_t size, size_t nmemb, void *userp):

    • Purpose: CURL callback to capture HTTP response data.
    • Logic: Appends response bytes to a std::string buffer.
    • Use Case: Collects tracker responses containing peer lists.
  • select_tracker_url(const json &torrent):

    • Purpose: Selects an HTTP/HTTPS tracker URL from the torrent’s announce or announce-list.
    • Logic:
      • Prefers announce-list (an array of arrays, e.g., [["http://tracker1"], ["http://tracker2"]]), common in YTS.mx torrents.
      • Falls back to announce (a single string, e.g., "http://tracker").
      • Validates URLs start with http:// or https://.
      • Throws an error if no valid tracker is found.
    • Why It Matters: YTS.mx torrents often use multiple trackers in announce-list, and this function ensures robust tracker selection.
  • Key Features:

    • Uses CURL for HTTP requests, ensuring compatibility with YTS.mx’s HTTP/HTTPS trackers.
    • Handles compact peer lists (&compact=1) for efficiency.
    • Robust error handling for failed requests or invalid responses.

4. Peer Communication

This section implements peer-to-peer communication to download torrent pieces.

  • exchange_peer_messages(const std::string &saved_path, const std::string &info_hash, const std::pair<std::string, uint16_t> &peer, int piece_index, int piece_length, const std::string &pieces):

    • Purpose: Connects to a peer, performs a handshake, and downloads a specified piece.
    • Logic:
      1. Socket Setup:
        • Creates a TCP socket (AF_INET, SOCK_STREAM).
        • Sets non-blocking mode and timeouts (10s for I/O, 5s for connection) using fcntl and setsockopt.
        • Connects to the peer’s IP and port, handling EINPROGRESS for asynchronous connections.
      2. Handshake:
        • Sends a 68-byte BitTorrent handshake: protocol identifier (19:BitTorrent protocol), reserved bytes, info hash, and random peer ID.
        • Receives and validates the peer’s handshake response.
      3. Protocol Messages:
        • Receives a bitfield message (indicating available pieces).
        • Sends an interested message (ID 2) to express interest.
        • Waits for an unchoke message (ID 1) from the peer.
        • Requests piece blocks (16KB each) using request messages (ID 6).
        • Receives piece messages (ID 7) and assembles the piece.
      4. Piece Verification:
        • Computes the SHA-1 hash of the downloaded piece.
        • Compares it with the expected hash from pieces (20-byte segments).
        • Writes the verified piece to saved_path.
    • Key Features:
      • Non-blocking sockets with select for robust connection handling.
      • Random peer ID generation using <random>.
      • Error handling for connection failures, invalid responses, or hash mismatches.
      • Efficient block-based downloading (16KB chunks).
  • Why It Matters: Peer communication is the core of BitTorrent’s decentralized file sharing. This function implements the protocol’s handshake and message exchange, enabling file downloads from peers.

5. Main Command Logic

This section defines the command-line interface and orchestrates the client’s functionality.

  • main(int argc, char *argv[]):

    • Purpose: Parses command-line arguments and executes one of four commands: info, peers, download_piece, download.
    • Logic:
      • Enables unbuffered I/O for immediate console output.
      • Validates the command and arguments.
      • Reads and decodes the .torrent file into a JSON object.
      • Executes the specified command.
  • Commands:

    1. info <torrent_file>:

      • Purpose: Displays torrent metadata.
      • Output: Tracker URL, file length, info hash, piece length, and piece hashes.
      • Logic:
        • Validates info fields (length, piece length, pieces).
        • Computes the info hash (SHA-1 of bencoded info).
        • Formats piece hashes as hexadecimal strings.
      • Use Case: Debugging and inspecting torrent files.
    2. peers <torrent_file>:

      • Purpose: Lists peers available for downloading.
      • Output: IP:port pairs (e.g., 192.168.1.1:6881).
      • Logic:
        • Constructs a tracker request with URL-encoded info hash, peer ID, and parameters (port=6881, compact=1).
        • Sends the request via CURL and decodes the bencoded response.
        • Parses the compact peer list (6-byte entries: 4-byte IP, 2-byte port).
        • Validates peers as a string to prevent JSON type errors.
      • Use Case: Identifying available peers.
    3. download_piece -o <save_path> <torrent_file> <piece_index>:

      • Purpose: Downloads a single piece and saves it to save_path.
      • Logic:
        • Fetches peers from the tracker.
        • Computes the piece length (adjusts for the last piece if smaller).
        • Tries peers sequentially until the piece is downloaded.
        • Calls exchange_peer_messages to download and verify the piece.
      • Use Case: Testing piece downloading or partial downloads.
    4. download -o <output_file> <torrent_file>:

      • Purpose: Downloads the entire file and saves it to output_file.
      • Logic:
        • Fetches peers from the tracker.
        • Iterates over all pieces, downloading each via exchange_peer_messages.
        • Saves pieces to temporary files (/tmp/piece_<index>), then assembles them into complete_file.
        • Writes the final file after all pieces are verified.
        • Limits peer attempts to 3 per piece for efficiency.
      • Use Case: Primary function for downloading YTS.mx torrents.
  • Key Features:

    • Robust argument validation and error handling.
    • Efficient piece assembly using a single buffer (complete_file).
    • Progress logging for user feedback.
    • YTS.mx-specific optimizations (e.g., announce-list support).

Implementation Details

YTS.mx Optimization

  • Single-File Torrents: The client assumes info["length"] exists, aligning with YTS.mx’s single-file torrents. Multi-file torrents are unsupported, throwing “Missing or invalid ‘length’ field”.
  • Tracker Support: select_tracker_url prioritizes announce-list, common in YTS.mx torrents, ensuring reliable tracker connections.
  • HTTP/HTTPS Only: Validates tracker URLs to prevent “Unsupported protocol” errors, as YTS.mx uses HTTP/HTTPS trackers.
  • JSON Type Safety: Checks peers is a string to avoid [json.exception.type_error.302] errors, handling YTS.mx’s compact peer lists.

Error Handling

  • Throws std::runtime_error for invalid inputs, network failures, or protocol violations.
  • Provides descriptive error messages (e.g., “Invalid piece index”, “CURL request failed”).
  • Validates torrent fields and tracker responses to prevent crashes.

Efficiency

  • Uses compact peer lists (&compact=1) for smaller tracker responses.
  • Downloads pieces in 16KB blocks to minimize network overhead.
  • Reuses CURL and socket resources efficiently.
  • Minimizes memory usage with in-place buffer operations.

Build and Run Instructions

Prerequisites

  • C++23 Compiler: GCC or Clang supporting C++23.
  • Dependencies:
    • libcurl: For HTTP requests.
    • openssl: For SHA-1 hashing.
    • nlohmann/json: JSON parsing (included in lib/nlohmann/json.hpp).
  • vcpkg: Dependency manager (set VCPKG_ROOT environment variable).
  • CMake: Build system.

Installation

  1. Clone the repository:

    git clone <repository_url>
    cd codecrafters-bittorrent
  2. Install dependencies via vcpkg:

    export VCPKG_ROOT=/path/to/vcpkg
    $VCPKG_ROOT/vcpkg install curl openssl
  3. Build the project:

    ./your_program.sh

    This runs cmake and builds the executable in build/bittorrent.

Usage

Run commands using ./your_program.sh <command> [args]:

  • Display torrent info:

    ./your_program.sh info sample.torrent
  • List peers:

    ./your_program.sh peers sample.torrent
  • Download a piece:

    ./your_program.sh download_piece -o piece.bin sample.torrent 0
  • Download entire file:

    ./your_program.sh download -o movie.mp4 sample.torrent

Testing

  • Use a YTS.mx single-file torrent (e.g., a movie file).
  • Verify download saves the file correctly.
  • Check info output matches torrent metadata.
  • Debug issues by adding std::cout << torrent.dump(2) << std::endl; to inspect JSON structures.

Concepts Learned

This project was a deep dive into several technical domains, reinforcing key programming and networking concepts:

  1. BitTorrent Protocol:

    • Understood bencode format for encoding torrent data.
    • Learned tracker communication (HTTP GET requests with parameters like info_hash, peer_id).
    • Mastered peer-to-peer protocol, including handshakes, bitfields, interested/unchoke messages, and piece requests.
    • Grasped piece hashing and verification using SHA-1.
  2. Network Programming:

    • Implemented TCP sockets for peer connections using <sys/socket.h>.
    • Used non-blocking I/O with fcntl and select for robust connections.
    • Handled timeouts and errors in network operations.
    • Utilized CURL for HTTP requests, managing callbacks and responses.
  3. C++23 Programming:

    • Leveraged modern C++ features (e.g., std::string, std::vector, range-based loops).
    • Used nlohmann::json for flexible data parsing.
    • Applied exception handling for robust error management.
    • Optimized memory usage with move semantics and in-place operations.
  4. Cryptography:

    • Computed SHA-1 hashes for info hash and piece verification.
    • Encoded binary data (hashes) for URL parameters.
  5. File Handling:

    • Read binary .torrent files using std::ifstream.
    • Wrote downloaded pieces and files using std::ofstream.
    • Managed temporary files for piece assembly.
  6. Build Systems:

    • Configured CMake for cross-platform builds.
    • Integrated vcpkg for dependency management.
    • Wrote shell scripts for local execution.
  7. Error Handling and Debugging:

    • Implemented comprehensive error checks for inputs, network, and protocol.
    • Used JSON dumping for debugging torrent structures.
    • Learned to diagnose CURL and socket errors.
  8. Protocol Optimization:

    • Prioritized announce-list for YTS.mx compatibility.
    • Used compact peer lists for efficiency.
    • Limited peer attempts to balance reliability and performance.

Challenges and Solutions

  • Challenge: Parsing bencode robustly.
    • Solution: Implemented recursive parsing with position tracking and error checks.
  • Challenge: Handling YTS.mx’s announce-list.
    • Solution: Wrote select_tracker_url to prioritize valid HTTP/HTTPS trackers.
  • Challenge: JSON type errors in tracker responses.
    • Solution: Added is_string checks for peers field.
  • Challenge: Reliable peer connections.
    • Solution: Used non-blocking sockets with timeouts and retry logic.
  • Challenge: Optimizing for single-file torrents.
    • Solution: Removed multi-file logic, focusing on info["length"].

Future Improvements

  • Multi-File Support: Add info["files"] parsing for broader torrent compatibility.
  • Parallel Downloads: Use threads to download multiple pieces simultaneously.
  • UDP Tracker Support: Extend select_tracker_url to handle udp:// trackers.
  • Progress Bar: Enhance download with a visual progress indicator.
  • Configuration: Allow user-specified peer IDs and ports via arguments.

Conclusion

This BitTorrent client is a testament to the power of C++ in implementing complex network protocols. By focusing on YTS.mx torrents, it achieves simplicity while demonstrating core BitTorrent functionality. The project deepened my understanding of networking, cryptography, and modern C++ programming, equipping me with skills to tackle distributed systems and protocol design. It’s a showcase of practical engineering, from parsing binary formats to managing peer connections, and serves as a foundation for future enhancements in P2P technology.