Core Concepts¶
This guide explains the fundamental concepts and design principles behind qldata.
Design Philosophy¶
qldata is built around three core principles:
- Fluent API - Chainable methods that read like natural language
- Sensible Defaults - Reasonable settings out of the box
- Explicit Over Implicit - Clear, predictable behavior
The Fluent API Pattern¶
qldata uses a fluent interface (also called method chaining). Each method returns an object that allows calling the next method:
# Traditional style (not qldata)
query = Query("BTCUSDT", "binance")
query.set_days(30)
query.set_resolution("1h")
df = query.execute()
# Fluent style (qldata)
df = qd.data("BTCUSDT", source="binance") \
.last(30) \
.resolution("1h") \
.get()
Query vs Execution¶
The fluent chain builds a query object. No data is fetched until you call .get():
# This only creates a query configuration - no network calls
query = qd.data("BTCUSDT", source="binance") \
.last(30) \
.resolution("1h")
# This actually fetches the data
df = query.get()
# Queries are reusable
df1 = query.get() # Fetches again
df2 = query.clean().get() # Same query, with cleaning
Data Models¶
qldata uses strongly-typed data models for consistency across exchanges.
Bar (OHLCV)¶
The most common data type - candlestick/bar data:
from qldata import Bar
# Returned as pandas DataFrame with these columns:
# - open: float - Opening price
# - high: float - Highest price
# - low: float - Lowest price
# - close: float - Closing price
# - volume: float - Trading volume
# Index: timestamp (pandas Timestamp, UTC)
Tick (Trade)¶
Individual trade/tick data:
from qldata import Tick
# Columns:
# - price: float - Trade price
# - quantity: float - Trade quantity
# - symbol: str - Trading pair
# - side: str - "buy" or "sell"
# Index: timestamp (pandas Timestamp, UTC)
OrderBook¶
Order book snapshots:
from qldata import OrderBook, OrderBookLevel
# OrderBook contains:
# - bids: List[OrderBookLevel] - Buy orders
# - asks: List[OrderBookLevel] - Sell orders
# - timestamp: datetime - Snapshot time
# - symbol: str - Trading pair
# OrderBookLevel contains:
# - price: Decimal
# - quantity: Decimal
SymbolInfo¶
Trading pair metadata:
from qldata import SymbolInfo
info = qd.get_symbol_info("BTCUSDT", source="binance")
# Properties:
# - symbol: str - Trading pair name
# - base_asset: str - Base currency (BTC)
# - quote_asset: str - Quote currency (USDT)
# - status: str - Trading status
# - is_active: bool - Currently trading
# - is_spot: bool - Spot market
# - is_perpetual: bool - Perpetual contract
# - filters: TradingFilters - Price/quantity constraints
Timeframe¶
Time intervals for bar data:
from qldata import Timeframe
# Pre-defined timeframes:
Timeframe.MINUTE_1 # 1m
Timeframe.MINUTE_5 # 5m
Timeframe.MINUTE_15 # 15m
Timeframe.HOUR_1 # 1h
Timeframe.HOUR_4 # 4h
Timeframe.DAY_1 # 1d
Timeframe.WEEK_1 # 1w
Exchange Categories¶
Different exchanges organize their markets differently:
Binance Categories¶
| Category | Market Type | Symbol Example |
|---|---|---|
"spot" | Spot trading | BTCUSDT |
"usdm" | USD-Margined perpetual futures | BTCUSDT |
# Binance spot
df = qd.data("BTCUSDT", source="binance", category="spot").last(7).resolution("1h").get()
# Binance USDM futures
df = qd.data("BTCUSDT", source="binance", category="usdm").last(7).resolution("1h").get()
Bybit Categories¶
| Category | Market Type | Symbol Example |
|---|---|---|
"spot" | Spot trading | BTCUSDT |
"linear" | Linear perpetual contracts | BTCUSDT |
# Bybit spot
df = qd.data("BTCUSDT", source="bybit", category="spot").last(7).resolution("1h").get()
# Bybit linear perpetuals
df = qd.data("BTCUSDT", source="bybit", category="linear").last(7).resolution("1h").get()
Transform Pipeline¶
qldata provides a powerful data transformation system:
Built-in Transforms¶
# Cleaning (removes duplicates, sorts, validates)
.clean()
# Fill missing values
.fill_forward() # Forward fill
.fill_backward() # Backward fill
.interpolate() # Linear interpolation
# Resample to different timeframe
.resample("1h")
Transform Order¶
Transforms are applied in the order you specify:
# Good: Clean first, then fill, then resample
df = qd.data("BTCUSDT", source="binance") \
.last(7) \
.resolution("1m") \
.clean() \
.fill_forward() \
.resample("1h") \
.get()
# Order matters! This produces different results:
df = qd.data("BTCUSDT", source="binance") \
.last(7) \
.resolution("1m") \
.resample("1h") \ # Resample before cleaning
.clean() \
.fill_forward() \
.get()
Custom Pipelines¶
For advanced use cases, create custom transform pipelines:
from qldata import TransformPipeline
pipeline = TransformPipeline() \
.add(remove_duplicates) \
.add(remove_outliers, sigma=3) \
.add(fill_forward)
df = pipeline.apply(raw_df)
Resilience Features¶
qldata is designed for production use with built-in resilience:
Auto-Reconnect¶
Streaming connections automatically reconnect on failure:
stream = qd.stream(["BTCUSDT"], source="binance") \
.resolution("tick") \
.on_data(handler) \
.get(start=True) # Auto-reconnect enabled by default
Rate Limiting¶
API calls respect exchange rate limits automatically:
# Even with many symbols, rate limits are handled
data = qd.data(["BTC", "ETH", "SOL", "DOGE", "XRP"], source="binance") \
.last(30) \
.resolution("1m") \
.get(parallel=True, workers=4) # Workers respect rate limits
Sequence Tracking¶
Streaming detects missed messages:
from qldata.resilience import SequenceTracker
# Built into streaming sessions
# Automatically logs warnings for gaps
Configuration¶
Global Configuration¶
import qldata as qd
# Configure at startup
qd.config(
data_dir="./market_data", # Where to cache data
cache_enabled=True, # Enable disk caching
validation_enabled=True # Validate data on fetch
)
Per-Query Configuration¶
# Override config for specific queries
df = qd.data("BTCUSDT", source="binance") \
.last(30) \
.resolution("1h") \
.get(
cache=False, # Don't cache this query
validate=False # Skip validation
)
Environment Variables¶
| Variable | Description |
|---|---|
QLDATA_DATA_DIR | Data directory path |
QLDATA_CACHE_ENABLED | Enable caching (true/false) |
QLDATA_VALIDATION_ENABLED | Enable validation (true/false) |
Error Handling¶
qldata uses a consistent exception hierarchy:
from qldata.errors import (
QldataError, # Base exception
ConnectionError, # Network issues
RateLimitError, # Rate limit exceeded
ValidationError, # Data validation failed
ConfigurationError, # Invalid configuration
)
try:
df = qd.data("INVALID", source="binance").last(1).get()
except ValidationError as e:
print(f"Validation failed: {e}")
except ConnectionError as e:
print(f"Network error: {e}")
except QldataError as e:
print(f"qldata error: {e}")
Best Practices¶
1. Use the Fluent API¶
# ✓ Good - Clear and readable
df = qd.data("BTCUSDT", source="binance") \
.last(30) \
.resolution("1h") \
.clean() \
.get()
# ✗ Avoid - Breaking the chain
query = qd.data("BTCUSDT", source="binance")
query = query.last(30)
query = query.resolution("1h")
df = query.get()
2. Always Clean Production Data¶
# ✓ Good - Data is validated
df = qd.data(...).clean().get()
# ✗ Risky - Raw data may have issues
df = qd.data(...).get()
3. Handle Errors Gracefully¶
# ✓ Good - Explicit error handling
try:
df = qd.data(...).get()
except QldataError as e:
logger.error(f"Data fetch failed: {e}")
df = fallback_data()
4. Use Appropriate Parallelism¶
# ✓ Good - Parallel for many symbols
data = qd.data(symbols, source="binance") \
.get(parallel=True, workers=4)
# ✗ Wasteful - Parallel for single symbol
df = qd.data("BTCUSDT", source="binance") \
.get(parallel=True, workers=4)
Next Steps¶
- Historical Data API - Deep dive into
qd.data() - Streaming API - Learn about
qd.stream() - Resilience - Production resilience features