Skip to content

Core Concepts

This guide explains the fundamental concepts and design principles behind qldata.

Design Philosophy

qldata is built around three core principles:

  1. Fluent API - Chainable methods that read like natural language
  2. Sensible Defaults - Reasonable settings out of the box
  3. Explicit Over Implicit - Clear, predictable behavior

The Fluent API Pattern

qldata uses a fluent interface (also called method chaining). Each method returns an object that allows calling the next method:

# Traditional style (not qldata)
query = Query("BTCUSDT", "binance")
query.set_days(30)
query.set_resolution("1h")
df = query.execute()

# Fluent style (qldata)
df = qd.data("BTCUSDT", source="binance") \
    .last(30) \
    .resolution("1h") \
    .get()

Query vs Execution

The fluent chain builds a query object. No data is fetched until you call .get():

# This only creates a query configuration - no network calls
query = qd.data("BTCUSDT", source="binance") \
    .last(30) \
    .resolution("1h")

# This actually fetches the data
df = query.get()

# Queries are reusable
df1 = query.get()  # Fetches again
df2 = query.clean().get()  # Same query, with cleaning

Data Models

qldata uses strongly-typed data models for consistency across exchanges.

Bar (OHLCV)

The most common data type - candlestick/bar data:

from qldata import Bar

# Returned as pandas DataFrame with these columns:
# - open: float - Opening price
# - high: float - Highest price
# - low: float - Lowest price
# - close: float - Closing price
# - volume: float - Trading volume
# Index: timestamp (pandas Timestamp, UTC)

Tick (Trade)

Individual trade/tick data:

from qldata import Tick

# Columns:
# - price: float - Trade price
# - quantity: float - Trade quantity
# - symbol: str - Trading pair
# - side: str - "buy" or "sell"
# Index: timestamp (pandas Timestamp, UTC)

OrderBook

Order book snapshots:

from qldata import OrderBook, OrderBookLevel

# OrderBook contains:
# - bids: List[OrderBookLevel] - Buy orders
# - asks: List[OrderBookLevel] - Sell orders
# - timestamp: datetime - Snapshot time
# - symbol: str - Trading pair

# OrderBookLevel contains:
# - price: Decimal
# - quantity: Decimal

SymbolInfo

Trading pair metadata:

from qldata import SymbolInfo

info = qd.get_symbol_info("BTCUSDT", source="binance")

# Properties:
# - symbol: str - Trading pair name
# - base_asset: str - Base currency (BTC)
# - quote_asset: str - Quote currency (USDT)
# - status: str - Trading status
# - is_active: bool - Currently trading
# - is_spot: bool - Spot market
# - is_perpetual: bool - Perpetual contract
# - filters: TradingFilters - Price/quantity constraints

Timeframe

Time intervals for bar data:

from qldata import Timeframe

# Pre-defined timeframes:
Timeframe.MINUTE_1   # 1m
Timeframe.MINUTE_5   # 5m
Timeframe.MINUTE_15  # 15m
Timeframe.HOUR_1     # 1h
Timeframe.HOUR_4     # 4h
Timeframe.DAY_1      # 1d
Timeframe.WEEK_1     # 1w

Exchange Categories

Different exchanges organize their markets differently:

Binance Categories

Category Market Type Symbol Example
"spot" Spot trading BTCUSDT
"usdm" USD-Margined perpetual futures BTCUSDT
# Binance spot
df = qd.data("BTCUSDT", source="binance", category="spot").last(7).resolution("1h").get()

# Binance USDM futures
df = qd.data("BTCUSDT", source="binance", category="usdm").last(7).resolution("1h").get()

Bybit Categories

Category Market Type Symbol Example
"spot" Spot trading BTCUSDT
"linear" Linear perpetual contracts BTCUSDT
# Bybit spot
df = qd.data("BTCUSDT", source="bybit", category="spot").last(7).resolution("1h").get()

# Bybit linear perpetuals
df = qd.data("BTCUSDT", source="bybit", category="linear").last(7).resolution("1h").get()

Transform Pipeline

qldata provides a powerful data transformation system:

Built-in Transforms

# Cleaning (removes duplicates, sorts, validates)
.clean()

# Fill missing values
.fill_forward()      # Forward fill
.fill_backward()     # Backward fill
.interpolate()       # Linear interpolation

# Resample to different timeframe
.resample("1h")

Transform Order

Transforms are applied in the order you specify:

# Good: Clean first, then fill, then resample
df = qd.data("BTCUSDT", source="binance") \
    .last(7) \
    .resolution("1m") \
    .clean() \
    .fill_forward() \
    .resample("1h") \
    .get()

# Order matters! This produces different results:
df = qd.data("BTCUSDT", source="binance") \
    .last(7) \
    .resolution("1m") \
    .resample("1h") \     # Resample before cleaning
    .clean() \
    .fill_forward() \
    .get()

Custom Pipelines

For advanced use cases, create custom transform pipelines:

from qldata import TransformPipeline

pipeline = TransformPipeline() \
    .add(remove_duplicates) \
    .add(remove_outliers, sigma=3) \
    .add(fill_forward)

df = pipeline.apply(raw_df)

Resilience Features

qldata is designed for production use with built-in resilience:

Auto-Reconnect

Streaming connections automatically reconnect on failure:

stream = qd.stream(["BTCUSDT"], source="binance") \
    .resolution("tick") \
    .on_data(handler) \
    .get(start=True)  # Auto-reconnect enabled by default

Rate Limiting

API calls respect exchange rate limits automatically:

# Even with many symbols, rate limits are handled
data = qd.data(["BTC", "ETH", "SOL", "DOGE", "XRP"], source="binance") \
    .last(30) \
    .resolution("1m") \
    .get(parallel=True, workers=4)  # Workers respect rate limits

Sequence Tracking

Streaming detects missed messages:

from qldata.resilience import SequenceTracker

# Built into streaming sessions
# Automatically logs warnings for gaps

Configuration

Global Configuration

import qldata as qd

# Configure at startup
qd.config(
    data_dir="./market_data",     # Where to cache data
    cache_enabled=True,           # Enable disk caching
    validation_enabled=True       # Validate data on fetch
)

Per-Query Configuration

# Override config for specific queries
df = qd.data("BTCUSDT", source="binance") \
    .last(30) \
    .resolution("1h") \
    .get(
        cache=False,      # Don't cache this query
        validate=False    # Skip validation
    )

Environment Variables

Variable Description
QLDATA_DATA_DIR Data directory path
QLDATA_CACHE_ENABLED Enable caching (true/false)
QLDATA_VALIDATION_ENABLED Enable validation (true/false)

Error Handling

qldata uses a consistent exception hierarchy:

from qldata.errors import (
    QldataError,           # Base exception
    ConnectionError,       # Network issues
    RateLimitError,        # Rate limit exceeded
    ValidationError,       # Data validation failed
    ConfigurationError,    # Invalid configuration
)

try:
    df = qd.data("INVALID", source="binance").last(1).get()
except ValidationError as e:
    print(f"Validation failed: {e}")
except ConnectionError as e:
    print(f"Network error: {e}")
except QldataError as e:
    print(f"qldata error: {e}")

Best Practices

1. Use the Fluent API

# ✓ Good - Clear and readable
df = qd.data("BTCUSDT", source="binance") \
    .last(30) \
    .resolution("1h") \
    .clean() \
    .get()

# ✗ Avoid - Breaking the chain
query = qd.data("BTCUSDT", source="binance")
query = query.last(30)
query = query.resolution("1h")
df = query.get()

2. Always Clean Production Data

# ✓ Good - Data is validated
df = qd.data(...).clean().get()

# ✗ Risky - Raw data may have issues
df = qd.data(...).get()

3. Handle Errors Gracefully

# ✓ Good - Explicit error handling
try:
    df = qd.data(...).get()
except QldataError as e:
    logger.error(f"Data fetch failed: {e}")
    df = fallback_data()

4. Use Appropriate Parallelism

# ✓ Good - Parallel for many symbols
data = qd.data(symbols, source="binance") \
    .get(parallel=True, workers=4)

# ✗ Wasteful - Parallel for single symbol
df = qd.data("BTCUSDT", source="binance") \
    .get(parallel=True, workers=4)

Next Steps