Usage

This section covers the basic usage of Urx and provides examples for common use cases.

Basic Usage

To scan a single domain:

urx example.com

To scan multiple domains:

urx example.com example.org

You can also pipe a list of domains from a file:

cat domains.txt | urx

Unified File Input:

Read URLs from files with automatic format detection (supports WARC files, URLTeam compressed files (gzip/bzip2), and plain text files):

urx --files urls.txt

Examples

Here are some examples of how to use Urx with different options:

Saving Output

Save results to a file:

urx example.com -o results.txt

Output in JSON format:

urx example.com -f json -o results.json

Filtering

Filter for JavaScript files only:

urx example.com -e js

Exclude HTML and text files:

urx example.com --exclude-extensions html,txt

Filter for API endpoints:

urx example.com --patterns api,v1,graphql

Exclude specific patterns:

urx example.com --exclude-patterns static,images

Use a filter preset to exclude common image types:

urx example.com -p no-images

Providers

Use specific providers:

urx example.com --providers wayback,otx

Using VirusTotal and URLScan providers requires API keys. You can provide them in multiple ways:

  1. Command Line:

    urx example.com --providers=vt,urlscan --vt-api-key=YOUR_VT_KEY --urlscan-api-key=YOUR_URLSCAN_KEY
    
  2. Environment Variables:

    export URX_VT_API_KEY=YOUR_VT_KEY
    export URX_URLSCAN_API_KEY=YOUR_URLSCAN_KEY
    urx example.com --providers=vt,urlscan
    
  3. Auto-Enabling: If API keys are provided, the providers are automatically enabled.

    urx example.com --vt-api-key=YOUR_VT_KEY --urlscan-api-key=YOUR_URLSCAN_KEY
    
  4. Multiple API key rotation (to mitigate rate limits)

    Using repeated flags for multiple keys:

    urx example.com --vt-api-key=key1 --vt-api-key=key2 --vt-api-key=key3
    

    Using environment variables with comma-separated keys:

    URX_VT_API_KEY=key1,key2,key3 URX_URLSCAN_API_KEY=ukey1,ukey2 urx example.com
    

    Combining CLI flags and environment variables (CLI keys are used first):

    URX_VT_API_KEY=env_key1,env_key2 urx example.com --vt-api-key=cli_key1 --vt-api-key=cli_key2
    

Discovery

By default, Urx includes URLs from robots.txt and sitemap.xml.

Exclude robots.txt:

urx example.com --exclude-robots

Exclude sitemap.xml:

urx example.com --exclude-sitemap

Other

Include subdomains in the scan:

urx example.com --subs

Check the HTTP status of collected URLs:

urx example.com --check-status

Extract additional links from the HTML of collected URLs:

urx example.com --extract-links

Network Configuration

Use a proxy, set a timeout, and increase parallel requests:

urx example.com --proxy http://localhost:8080 --timeout 60 --parallel 10 --insecure

Advanced Filtering

Combine multiple filters:

urx example.com -e js,php --patterns admin,login --exclude-patterns logout,static --min-length 20

Filter by HTTP status codes:

urx example.com --include-status 200,30x,405 --exclude-status 20x

Unified File Input

Read URLs from a single file (auto-detects format):

urx --files urls.txt

Read from multiple files with space separation:

urx --files urls.txt archive.warc data.gz

Read from multiple files with repeated flags:

urx --files urls.txt --files archive.warc

Combine with filtering and formatting options:

urx --files data.txt --patterns api,admin -f json

URL normalization and deduplication:

Normalize URLs by sorting query parameters and removing trailing slashes:

urx example.com --normalize-url

Combine normalization with endpoint merging for comprehensive deduplication:

urx example.com --normalize-url --merge-endpoint

URL normalization with file input:

urx --files urls.txt --normalize-url

Caching and Incremental Scanning

Urx supports caching to improve performance for repeated scans and incremental scanning to discover only new URLs.

# Enable caching with SQLite (default)
urx example.com --cache-type sqlite --cache-path ~/.urx/cache.db

# Use Redis for distributed caching
urx example.com --cache-type redis --redis-url redis://localhost:6379

# Incremental scanning - only show new URLs since last scan
urx example.com --incremental

# Set cache TTL (time-to-live) to 12 hours
urx example.com --cache-ttl 43200

# Disable caching entirely
urx example.com --no-cache

# Combine incremental scanning with filters
urx example.com --incremental -e js,php --patterns api

# Configuration file with caching settings
urx -c example/config.toml example.com

Caching Use Cases

# Daily monitoring - only alert on new URLs
urx target.com --incremental --silent | notify-tool

# Efficient domain lists processing
cat domains.txt | urx --incremental --cache-ttl 3600 > new_urls.txt

# Distributed team scanning with Redis
urx example.com --cache-type redis --redis-url redis://shared-cache:6379

# Fast re-scans during development
urx test-domain.com --cache-ttl 300  # 5-minute cache for rapid iterations