Your task is to build a Data Processing Toolkit — an interactive command-line application that performs various useful data processing operations. The tool should work as a persistent Node.js process that accepts commands.
Unlike the Node.js Basics assignment where you practiced APIs in isolation, here you will combine them into a real, cohesive tool with interactive file system navigation and data processing capabilities.
- Any external tools and libraries are prohibited
- Use 24.x.x version (24.10.0 or upper) of Node.js
- All file operations must use Streams API for efficiency (do not read entire files into memory)
- Prefer asynchronous API whenever possible
- The program should be an interactive REPL (Read-Eval-Print Loop)
- File paths in commands can be relative or absolute
The program is started via npm-script start:
npm run startWhich runs:
node src/main.jsThe program should:
- Display a welcome message on startup:
Welcome to Data Processing CLI! - Print the current working directory initially:
You are currently in /path/to/home - Continuously prompt the user to enter commands:
> - Accept commands in the format:
<command> [arguments] - Display error messages for unknown or invalid commands without crashing
- Allow users to exit with
.exitcommand orCtrl+C - Display a goodbye message on exit:
Thank you for using Data Processing CLI! - After each successful operation, print the current working directory again
- At the start of the program, working directory should be the user's home directory
If a command is unknown, invalid, or has missing required arguments, the program should print an error message like Invalid input and prompt for a new command.
If an operation fails, the program should print Operation failed and prompt for a new command.
upBehavior:
- Moves up one directory level from the current working directory
- If already in the root directory, does nothing (no error)
- After successful navigation, prints the new current working directory path
cd path_to_directorypath_to_directory— relative or absolute path to navigate to (required)
Behavior:
- Navigates to the specified directory
- Can accept both relative and absolute paths
- If path doesn't exist or is not a directory, prints
Operation failedand stays in current directory - If successful, prints the new current working directory path
lsOutput:
- A list of all files and folders in the current directory
- Folders listed first, then files, all in alphabetical order
- Each entry shows the name (with extension for files) and type (file or folder)
Example:
folder1 [folder]
folder2 [folder]
file1.txt [file]
file2.md [file]
Convert a CSV file to a JSON file using Streams.
csv-to-json --input data.csv --output data.json--input— path to the input CSV file (required)--output— path to the output JSON file (required)
Behavior:
- The first line of the CSV file is treated as headers
- Each subsequent line becomes a JSON object with header names as keys
- The output file should contain a JSON array of objects
- Must use Readable Stream → Transform Stream → Writable Stream pipeline
- Paths are relative to the current working directory or can be absolute
- If the input file doesn't exist, print
Operation failed
Example:
Input data.csv:
name,age,city
Alice,30,New York
Bob,25,London
Output data.json:
[
{ "name": "Alice", "age": "30", "city": "New York" },
{ "name": "Bob", "age": "25", "city": "London" }
]Convert a JSON file (array of objects) to a CSV file using Streams.
json-to-csv --input data.json --output data.csv--input— path to the input JSON file (required)--output— path to the output CSV file (required)
Behavior:
- Input must be a JSON array of objects
- The first line of the output is the headers (keys from the first object)
- Each object becomes a CSV row
- Paths are relative to the current working directory or can be absolute
- If the input file doesn't exist or contains invalid JSON, print
Operation failed
Count lines, words, and characters in a file (similar to the wc command).
count --input file.txt--input— path to the input file (required)
Output format:
Lines: 42
Words: 350
Characters: 2048
Behavior:
- Must use Streams API to process the file (do not load the entire file into memory)
- A word is any sequence of non-whitespace characters
- Paths are relative to the current working directory or can be absolute
- If the input file doesn't exist, print
Operation failed
Calculate a cryptographic hash of a file.
hash --input file.txt
hash --input file.txt --algorithm md5
hash --input file.txt --save--input— path to the input file (required)--algorithm— hash algorithm to use (optional, default:sha256). Supported values:sha256,md5,sha512--save— optional flag; if provided, save hash to a file next to the source file
Output format:
sha256: 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
Behavior:
- Must use
crypto.createHashwith Streams API - Paths are relative to the current working directory or can be absolute
- If the input file doesn't exist, print
Operation failed - If the algorithm is not supported, print
Operation failed - If
--saveis passed, write hash to<inputFilename>.<algorithm>(example:file.txt.sha256)
Calculate file hash and compare it with a value stored in a hash file.
hash-compare --input file.txt --hash file.txt.sha256
hash-compare --input file.txt --hash file.txt.md5 --algorithm md5--input— path to the input file (required)--hash— path to file with expected hash (required)--algorithm— hash algorithm to use (optional, default:sha256). Supported values:sha256,md5,sha512
Output format:
OK
or
MISMATCH
Behavior:
- Must calculate hash of
--inputusing Streams API - Must read expected hash value from
--hashfile - Comparison should be case-insensitive and ignore trailing newline in hash file
- Paths are relative to the current working directory or can be absolute
- If input or hash file doesn't exist, print
Operation failed - If algorithm is not supported, print
Operation failed
Encrypt a file using AES-256-GCM.
encrypt --input file.txt --output file.txt.enc --password mySecret--input— path to the input file (required)--output— path to the output encrypted file (required)--password— password used to derive the encryption key (required)
Output file format (binary):
- First 16 bytes:
salt - Next 12 bytes:
iv - Then:
ciphertext - Last 16 bytes:
authTag
Behavior:
- Must derive a 32-byte key from
passwordandsalt - Must encrypt using
AES-256-GCM - Must use Streams API end-to-end
- You must not load the full file into memory. The only allowed in-memory buffering is:
- the header (first 28 bytes =
salt+iv) - the authentication tag (last 16 bytes)
- the header (first 28 bytes =
- Paths are relative to the current working directory or can be absolute
- If the input file doesn't exist, print
Operation failed
Decrypt a file produced by encrypt.
decrypt --input file.txt.enc --output file.txt --password mySecret--input— path to the input encrypted file (required)--output— path to the output file (required)--password— password used to derive the encryption key (required)
Behavior:
- Must parse
salt(first 16 bytes) andiv(next 12 bytes) from the input - Must parse
authTag(last 16 bytes) from the input - Must decrypt using
AES-256-GCMwith authentication tag verification - Must use Streams API end-to-end
- The decrypted result must match the original file content exactly
- Paths are relative to the current working directory or can be absolute
- If the input file doesn't exist or auth fails, print
Operation failed
Compute statistics for a large log file using Worker Threads for parallel processing.
log-stats --input logs.txt --output stats.json--input— path to the input log file (required)--output— path to the output JSON file (required)
Log line format (space-separated):
<isoTimestamp> <level> <service> <statusCode> <responseTimeMs> <method> <path>
Example line:
2026-02-01T12:34:56.789Z INFO user-service 200 123 GET /api/users
Output format (JSON):
{
"total": 1000,
"levels": { "INFO": 700, "WARN": 200, "ERROR": 100 },
"status": { "2xx": 800, "3xx": 50, "4xx": 120, "5xx": 30 },
"topPaths": [
{ "path": "/api/users", "count": 120 },
{ "path": "/api/orders", "count": 95 }
],
"avgResponseTimeMs": 137.42
}
Behavior:
- Split the input file into N chunks (where N = number of CPU cores), ensuring chunks start and end on line boundaries
- Send each chunk to a Worker Thread for parsing and partial aggregation
- Each Worker returns partial stats: counts by level, counts by status class, path counts, total lines, response time sum
- The main thread merges partial stats and computes final
avgResponseTimeMs - Write the JSON result to the output file
- Must use Worker Threads for parallel processing
- The number of workers should equal the number of logical CPU cores
- Paths are relative to the current working directory or can be absolute
- If the input file doesn't exist, print
Operation failed
Test data generator: Use the provided script to generate a large log file for testing:
node scripts/generate-logs.js --output workspace/logs.txt --lines 500000src/
main.js — entry point, sets up REPL, handles navigation state
repl.js — REPL handler, command parsing and dispatching
navigation.js — navigation commands (up, cd, ls)
commands/
csvToJson.js — csv-to-json command handler
jsonToCsv.js — json-to-csv command handler
count.js — count command handler
hash.js — hash command handler
hashCompare.js — hash-compare command handler
encrypt.js — encrypt command handler
decrypt.js — decrypt command handler
logStats.js — log-stats command handler
workers/
logWorker.js — worker thread for log-stats command
utils/
pathResolver.js — resolve paths relative to current working directory
argParser.js — parse command line arguments
- Use
readlinemodule for interactive input - Use
stream.pipeline(fromstream/promises) to connect streams and handle errors properly - For CSV parsing in the Transform stream, handle the first line (headers) separately from data lines
- For
json-to-csv, you'll need to buffer the JSON input to parse it, but write the CSV output via a stream - For
log-stats, make sure chunks start/end on line boundaries to avoid partial log lines - For merging stats, sum counters and merge path maps before computing
topPaths - Always resolve file paths relative to the current working directory before performing operations
- Use
path.resolve()to combine current working directory with relative paths - Use
process.cwd()is NOT appropriate here - maintain your own current working directory variable - Maintain the current working directory as application state throughout the session