Skip to content

Latest commit

 

History

History
367 lines (283 loc) · 12.1 KB

File metadata and controls

367 lines (283 loc) · 12.1 KB

Assignment: Data Processing CLI

Description

Your task is to build a Data Processing Toolkit — an interactive command-line application that performs various useful data processing operations. The tool should work as a persistent Node.js process that accepts commands.

Unlike the Node.js Basics assignment where you practiced APIs in isolation, here you will combine them into a real, cohesive tool with interactive file system navigation and data processing capabilities.

Technical requirements

  • Any external tools and libraries are prohibited
  • Use 24.x.x version (24.10.0 or upper) of Node.js
  • All file operations must use Streams API for efficiency (do not read entire files into memory)
  • Prefer asynchronous API whenever possible
  • The program should be an interactive REPL (Read-Eval-Print Loop)
  • File paths in commands can be relative or absolute

CLI Interface

The program is started via npm-script start:

npm run start

Which runs:

node src/main.js

The program should:

  • Display a welcome message on startup: Welcome to Data Processing CLI!
  • Print the current working directory initially: You are currently in /path/to/home
  • Continuously prompt the user to enter commands: >
  • Accept commands in the format: <command> [arguments]
  • Display error messages for unknown or invalid commands without crashing
  • Allow users to exit with .exit command or Ctrl+C
  • Display a goodbye message on exit: Thank you for using Data Processing CLI!
  • After each successful operation, print the current working directory again
  • At the start of the program, working directory should be the user's home directory

If a command is unknown, invalid, or has missing required arguments, the program should print an error message like Invalid input and prompt for a new command.

If an operation fails, the program should print Operation failed and prompt for a new command.

Commands

Navigation & Working Directory Commands

up — Move up one directory level

up

Behavior:

  • Moves up one directory level from the current working directory
  • If already in the root directory, does nothing (no error)
  • After successful navigation, prints the new current working directory path

cd — Change to a specified directory

cd path_to_directory
  • path_to_directory — relative or absolute path to navigate to (required)

Behavior:

  • Navigates to the specified directory
  • Can accept both relative and absolute paths
  • If path doesn't exist or is not a directory, prints Operation failed and stays in current directory
  • If successful, prints the new current working directory path

ls — List files and directories in current directory

ls

Output:

  • A list of all files and folders in the current directory
  • Folders listed first, then files, all in alphabetical order
  • Each entry shows the name (with extension for files) and type (file or folder)

Example:

folder1    [folder]
folder2    [folder]
file1.txt  [file]
file2.md   [file]

Data Processing Commands

1. csv-to-json — Convert CSV to JSON

Convert a CSV file to a JSON file using Streams.

csv-to-json --input data.csv --output data.json
  • --input — path to the input CSV file (required)
  • --output — path to the output JSON file (required)

Behavior:

  • The first line of the CSV file is treated as headers
  • Each subsequent line becomes a JSON object with header names as keys
  • The output file should contain a JSON array of objects
  • Must use Readable Stream → Transform Stream → Writable Stream pipeline
  • Paths are relative to the current working directory or can be absolute
  • If the input file doesn't exist, print Operation failed

Example:

Input data.csv:

name,age,city
Alice,30,New York
Bob,25,London

Output data.json:

[
  { "name": "Alice", "age": "30", "city": "New York" },
  { "name": "Bob", "age": "25", "city": "London" }
]

2. json-to-csv — Convert JSON to CSV

Convert a JSON file (array of objects) to a CSV file using Streams.

json-to-csv --input data.json --output data.csv
  • --input — path to the input JSON file (required)
  • --output — path to the output CSV file (required)

Behavior:

  • Input must be a JSON array of objects
  • The first line of the output is the headers (keys from the first object)
  • Each object becomes a CSV row
  • Paths are relative to the current working directory or can be absolute
  • If the input file doesn't exist or contains invalid JSON, print Operation failed

3. count — Count lines, words, and characters in txt file

Count lines, words, and characters in a file (similar to the wc command).

count --input file.txt
  • --input — path to the input file (required)

Output format:

Lines: 42
Words: 350
Characters: 2048

Behavior:

  • Must use Streams API to process the file (do not load the entire file into memory)
  • A word is any sequence of non-whitespace characters
  • Paths are relative to the current working directory or can be absolute
  • If the input file doesn't exist, print Operation failed

4. hash — Calculate file hash

Calculate a cryptographic hash of a file.

hash --input file.txt
hash --input file.txt --algorithm md5
hash --input file.txt --save
  • --input — path to the input file (required)
  • --algorithm — hash algorithm to use (optional, default: sha256). Supported values: sha256, md5, sha512
  • --save — optional flag; if provided, save hash to a file next to the source file

Output format:

sha256: 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

Behavior:

  • Must use crypto.createHash with Streams API
  • Paths are relative to the current working directory or can be absolute
  • If the input file doesn't exist, print Operation failed
  • If the algorithm is not supported, print Operation failed
  • If --save is passed, write hash to <inputFilename>.<algorithm> (example: file.txt.sha256)

5. hash-compare — Compare file hash with expected hash

Calculate file hash and compare it with a value stored in a hash file.

hash-compare --input file.txt --hash file.txt.sha256
hash-compare --input file.txt --hash file.txt.md5 --algorithm md5
  • --input — path to the input file (required)
  • --hash — path to file with expected hash (required)
  • --algorithm — hash algorithm to use (optional, default: sha256). Supported values: sha256, md5, sha512

Output format:

OK

or

MISMATCH

Behavior:

  • Must calculate hash of --input using Streams API
  • Must read expected hash value from --hash file
  • Comparison should be case-insensitive and ignore trailing newline in hash file
  • Paths are relative to the current working directory or can be absolute
  • If input or hash file doesn't exist, print Operation failed
  • If algorithm is not supported, print Operation failed

6. encrypt — Encrypt a file

Encrypt a file using AES-256-GCM.

encrypt --input file.txt --output file.txt.enc --password mySecret
  • --input — path to the input file (required)
  • --output — path to the output encrypted file (required)
  • --password — password used to derive the encryption key (required)

Output file format (binary):

  • First 16 bytes: salt
  • Next 12 bytes: iv
  • Then: ciphertext
  • Last 16 bytes: authTag

Behavior:

  • Must derive a 32-byte key from password and salt
  • Must encrypt using AES-256-GCM
  • Must use Streams API end-to-end
  • You must not load the full file into memory. The only allowed in-memory buffering is:
    • the header (first 28 bytes = salt + iv)
    • the authentication tag (last 16 bytes)
  • Paths are relative to the current working directory or can be absolute
  • If the input file doesn't exist, print Operation failed

7. decrypt — Decrypt a file

Decrypt a file produced by encrypt.

decrypt --input file.txt.enc --output file.txt --password mySecret
  • --input — path to the input encrypted file (required)
  • --output — path to the output file (required)
  • --password — password used to derive the encryption key (required)

Behavior:

  • Must parse salt (first 16 bytes) and iv (next 12 bytes) from the input
  • Must parse authTag (last 16 bytes) from the input
  • Must decrypt using AES-256-GCM with authentication tag verification
  • Must use Streams API end-to-end
  • The decrypted result must match the original file content exactly
  • Paths are relative to the current working directory or can be absolute
  • If the input file doesn't exist or auth fails, print Operation failed

8. log-stats — Analyze a large log file using Worker Threads

Compute statistics for a large log file using Worker Threads for parallel processing.

log-stats --input logs.txt --output stats.json
  • --input — path to the input log file (required)
  • --output — path to the output JSON file (required)

Log line format (space-separated):

<isoTimestamp> <level> <service> <statusCode> <responseTimeMs> <method> <path>

Example line:

2026-02-01T12:34:56.789Z INFO user-service 200 123 GET /api/users

Output format (JSON):

{
  "total": 1000,
  "levels": { "INFO": 700, "WARN": 200, "ERROR": 100 },
  "status": { "2xx": 800, "3xx": 50, "4xx": 120, "5xx": 30 },
  "topPaths": [
    { "path": "/api/users", "count": 120 },
    { "path": "/api/orders", "count": 95 }
  ],
  "avgResponseTimeMs": 137.42
}

Behavior:

  1. Split the input file into N chunks (where N = number of CPU cores), ensuring chunks start and end on line boundaries
  2. Send each chunk to a Worker Thread for parsing and partial aggregation
  3. Each Worker returns partial stats: counts by level, counts by status class, path counts, total lines, response time sum
  4. The main thread merges partial stats and computes final avgResponseTimeMs
  5. Write the JSON result to the output file
  • Must use Worker Threads for parallel processing
  • The number of workers should equal the number of logical CPU cores
  • Paths are relative to the current working directory or can be absolute
  • If the input file doesn't exist, print Operation failed

Test data generator: Use the provided script to generate a large log file for testing:

node scripts/generate-logs.js --output workspace/logs.txt --lines 500000

Project Structure

src/
  main.js          — entry point, sets up REPL, handles navigation state
  repl.js          — REPL handler, command parsing and dispatching
  navigation.js    — navigation commands (up, cd, ls)
  commands/
    csvToJson.js   — csv-to-json command handler
    jsonToCsv.js   — json-to-csv command handler
    count.js       — count command handler
    hash.js        — hash command handler
    hashCompare.js — hash-compare command handler
    encrypt.js     — encrypt command handler
    decrypt.js     — decrypt command handler
    logStats.js    — log-stats command handler
  workers/
    logWorker.js   — worker thread for log-stats command
  utils/
    pathResolver.js  — resolve paths relative to current working directory
    argParser.js     — parse command line arguments

Hints

  • Use readline module for interactive input
  • Use stream.pipeline (from stream/promises) to connect streams and handle errors properly
  • For CSV parsing in the Transform stream, handle the first line (headers) separately from data lines
  • For json-to-csv, you'll need to buffer the JSON input to parse it, but write the CSV output via a stream
  • For log-stats, make sure chunks start/end on line boundaries to avoid partial log lines
  • For merging stats, sum counters and merge path maps before computing topPaths
  • Always resolve file paths relative to the current working directory before performing operations
  • Use path.resolve() to combine current working directory with relative paths
  • Use process.cwd() is NOT appropriate here - maintain your own current working directory variable
  • Maintain the current working directory as application state throughout the session