Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

tva Common Conventions

This document defines the naming and behavior conventions for parameters shared across tva subcommands to ensure a consistent user experience.

Header Handling

Headers are the column name rows in data files. Different commands have different header processing requirements, but parameter naming should remain consistent.

Quick Selection:

  • Need column names for field references? Use --header (standard TSV) or --header-hash1 (TSV with comments).
  • Just skip header lines? Use --header-lines N (first N lines) or --header-hash (comment lines only).

Header Detection Modes (mutually exclusive):

  • Modes that provide column names (header_args_with_columns()):

    • --header / -H: FirstLine mode

      • Takes the first line as column names.
      • Simplest mode for standard TSV files.
      • lines is empty, column_names_line is the first line.
    • --header-hash1: HashLines1 mode

      • Takes consecutive # lines plus the next line as header.
      • Graceful degradation: If no # lines exist, uses the first line as column names ( behaves like --header).
      • lines contains only # lines (empty if no # lines); column names line is stored separately.

    Commands using these modes: append, bin, blank, fill, filter, join, longer, nl, reverse, select, stats, uniq, wider.

  • Modes that don’t provide column names (header_args()):

    • --header-lines N: LinesN mode

      • Takes up to N lines as header (fewer if file is shorter).
      • Does not extract column names.
      • lines contains up to n lines, column_names_line is None.
    • --header-hash: HashLines mode

      • Takes all consecutive # lines as header (metadata only).
      • No column names line is extracted.
      • lines contains # lines, column_names_line is None.

    Commands using these modes: check, slice, sort.

Library Implementation:

  • Use TsvReader::read_header_mode(mode) to read headers.
  • Returns HeaderInfo { lines, column_names_line } where:
    • lines: all header lines read from input
    • column_names_line: the line containing column names (None if mode doesn’t provide column names)
  • Mode behavior:
    • FirstLine: lines is empty, column_names_line is the first line
    • LinesN(n): lines contains up to n lines read, column_names_line is None
    • HashLines: lines contains all consecutive # lines, column_names_line is None
    • HashLines1: lines contains only # lines (empty if no # lines), column_names_line is the column names line

Special Commands:

  • split: Uses --header-in-out (input has header, output writes header, default) or --header-in-only (input has header, output does not write header). --header is an alias for --header-in-out.
  • keep-header: Uses --lines N / -n to specify number of header lines (default: 1)
  • sample: Uses simple --header / -H flag (treats first line as header)
  • transpose: Does not support header modes (processes all lines as data)

Multi-file Header Behavior:

  • When using multiple input files with header mode enabled, the header from the first file is read and written to output.
  • Headers from subsequent files are skipped.

Input/Output Conventions

Parameter Naming

TypeParameter NameDescription
Single file inputinfilePositional argument
Multiple file inputinfilesPositional argument, supports multiple
Output file--outfile / -oOptional, defaults to stdout

Special Values

  • stdin or -: Read from standard input
  • stdout: Output to standard output (used with --outfile)

Field Selection Syntax

Commands that support field selection (e.g., select, filter, sort) use a unified field syntax.

  • 1-based Indexing

    • Fields are numbered starting from 1 (following Unix cut/awk convention).
    • Example: 1,3,5 selects the 1st, 3rd, and 5th columns.
  • Field Names

    • Requires the --header flag (or command-specific header option).
    • Names are case-sensitive.
    • Example: date,user_id selects columns named “date” and “user_id”.
  • Ranges

    • Numeric Ranges: start-end. Example: 2-4 selects columns 2, 3, and 4.
    • Name Ranges: start_col-end_col. Selects all columns from start_col to end_col inclusive, based on their order in the header.
    • Reverse Ranges: 5-3 is automatically treated as 3-5.
  • Wildcards

    • * matches any sequence of characters in a field name.
    • Example: user_* selects user_id, user_name, etc.
    • Example: *_time selects start_time, end_time.
  • Escaping

    • Special characters in field names (like space, comma, colon, dash, star) must be escaped with \.
    • Example: Order\ ID selects the column “Order ID”.
    • Example: run\:id selects “run:id”.
  • Exclusion

    • Negative selection is typically handled via a separate flag (e.g., --exclude in select), but uses the same field syntax.

Numeric Parameter Conventions

ParameterDescriptionExample
--lines N / -nSpecify line count--lines 100
--fields N / -fSpecify fields--fields 1,2,3
--delimiterField delimiter--delimiter ','

Random and Sampling

ParameterDescription
--seed NSpecify random seed for reproducibility
--static-seedUse fixed default seed

Boolean Flags

Boolean flags use --flag to enable, without a value:

  • --header not --header true
  • --append / -a not --append true

Expr Syntax

The expr command supports a rich expression language for data transformation.

  • Column references: @1, @2 (1-based) or @name (when headers provided)
  • Whole row reference: @0 (original row data)
  • Variables: @var_name (bound by as, persists across rows)
  • Global variables: @__index, @__file, @__row (built-in)
  • Arithmetic: +, -, *, /, %, **
  • Comparison: ==, !=, <, <=, >, >=
  • String comparison: eq, ne, lt, le, gt, ge
  • Logical: and, or, not
  • String concatenation: ++
  • Functions: trim(), upper(), lower(), len(), abs(), round(), min(), max(), if(), default(), substr(), replace(), split(), join(), range(), map(), filter(), reduce()
  • Pipe operator: | for chaining functions (e.g., @name | trim() | upper())
  • Underscore placeholder: _ for piped values in multi-argument functions (e.g., @name | substr(_, 0, 3))
  • Lambda expressions: x => x + 1 or (x, y) => x + y
  • List literals: [1, 2, 3] or [@a, @b, @c]
  • Variable binding: as for intermediate results (e.g., @price * @qty as @total; @total * 0.9)
  • Method call syntax: @name.upper(), @num.abs()

Full expr syntax documentation is available at here.

Error Handling

All commands follow the same error output format:

tva <command>: <error message>

Serious errors return non-zero exit codes.