Case-study research deck / website version

Systematic Long-Only Equity Selection from Price Paths

Price-only LSTM ranking signal, execution-aware backtest, and long-only portfolio construction.

Price-only datasetLong-onlyNext-day close execution0.05% one-way costTop-18 / Buffer-30

Research Thesis and Workflow

Hypothesis: price paths contain cross-sectional ranking signals. This deck shows how the signal was built, tested, and converted into a portfolio.

Research Pipeline

Clean the raw price panel

Convert raw daily prices into a consistent research panel, remove invalid observations, control extreme returns, and define the eligible trading universe.

Data preparation layer

↓

Engineer price-path signals

Build momentum, reversal, trend, volatility, breakout, and path-quality features using only information available up to each signal date.

Feature and label layer

↓

Train and compare ranking models

Compare tabular baselines with sequence models, then retain the LSTM score after time-series input normalization improves stability.

Alpha model layer

↓

Translate scores into a portfolio

Apply Top-N selection, holding buffer, equal weighting, next-day close execution, transaction costs, benchmark comparison, and robustness checks.

Portfolio and evaluation layer

System Design

Inputs

Cleaned price panel, tradable-universe mask, and precomputed model scores.

Price-only data

→

Research Engine

Feature construction, execution-aware labels, sequence modeling, and cross-sectional score generation.

Signal generation

→

Portfolio Layer

Top-18 selection, Buffer-30 retention rule, equal weighting, cost deduction, and benchmark-relative metrics.

Backtest output

Research Iterations

Research path from raw prices to the final Top-18 / Buffer-30 portfolio.

Loading research stage…

Performance Evolution: From Baseline Signal to Final Portfolio

Sharpe improved through three levers: better signal design, cleaner timing alignment, and tighter portfolio construction.

Sharpe Improvement Ladder

Documented report-ready iterations. Each bar shows the best candidate in that stage, not every experiment attempted during research.

Key driver 1: stronger price-path signal extraction Key driver 2: execution-aware alignment Key driver 3: concentrated top-bucket portfolio construction

Data and Benchmark Sanity Checks

With no external index provided, I used an internal equal-weight eligible-universe benchmark.

The equal-weight raw universe performed strongly during the evaluation period, which explains why the internal benchmark return is high.

The benchmark is an internal price-panel diagnostic, not an external market index.

Portfolio Construction Explorer

The final improvement came from separating score generation from portfolio construction. Top-N and holding-buffer sweeps showed that LSTM alpha was concentrated in the highest-ranked names.

Top-N / Buffer Sharpe Heatmap

Hover cells for details. The final Top-18 / Buffer-30 point is highlighted.

Selected Configuration

Top 18 / Buffer 30
Sharpe 5.27. The strongest region is around Top-18 / Top-20, indicating that excessive diversification diluted the top-bucket signal.

Holding Buffer Rule

New positions must enter the Top-18 bucket. Existing holdings may remain if they are still ranked inside the broader Top-30 buffer. This reduces unnecessary churn around the selection boundary.

Final Performance vs Internal Benchmark

The final strategy is evaluated net of 0.05% one-way transaction costs. It is compared with an internal equal-weight benchmark constructed from the same score-available eligible universe.

The strategy generated higher return and higher risk-adjusted return than the internal benchmark, while maintaining comparable drawdown.

The strategy initially moved close to the benchmark, then accumulated selection alpha through repeated weekly re-ranking and portfolio refresh.

Final strategy versus benchmark drawdown

Higher return was not achieved by materially worse maximum drawdown relative to the internal benchmark.

Backtest Integrity Controls

The backtest explicitly addresses execution timing, transaction costs, benchmark alignment, and label construction to reduce common implementation errors.

✓

Signal after close
No same-close trading assumption.

✓

Next-day close execution
Returns start only after execution.

✓

Execution-aware label
Future return starts from Price[t+1].

✓

Cost deducted
Turnover × 0.05% one-way transaction cost.

✓

Long-only and cash allowed
Non-negative weights; uninvested cash earns 0%.

✓

Independent benchmark
Pre-threshold eligible universe prevents strategy filter contamination.

Robustness and Rejected Variants

Robustness checks were used to test whether the final portfolio choice was stable and whether more complex alternatives genuinely improved the signal. The conclusion from the report and code is that the simple single-LSTM portfolio remained the strongest and most interpretable choice.

Local Edge Check

Lower SharpeHigher SharpeOutline = final selected point

Top 18 / Buffer 30
Sharpe 5.27. This is the selected final portfolio. Nearby cells stay around or above Sharpe 5, which suggests the result is not a single isolated optimum.

Rows represent the concentration level (Top-N holdings) and columns represent the holding buffer used to reduce churn. The chosen Top-18 / Buffer-30 point sits inside a stable high-Sharpe neighborhood.

Seed Bagging Check

Five-seed rank averaging reduced performance because weaker seeds diluted the strongest single-LSTM signal. The final selection therefore uses the best single LSTM rather than a broad seed ensemble.

Discussion Topics

Benchmark interpretation+

The internal benchmark is an equal-weight eligible-universe portfolio, not a public index. The key comparison is selection alpha versus the same eligible universe.

Final portfolio rationale+

The LSTM signal is most concentrated in the highest-ranked names. Top-18 improves signal purity, while Buffer-30 reduces unnecessary churn near the selection cutoff.

Rejected variants+

Inverse-volatility weighting, broad regime filters, and seed bagging were tested. They were useful diagnostics, but they generally reduced return or diluted the strongest single-LSTM top-bucket signal.

Production extension path+

Next step: add liquidity, sector, market-cap data, risk model, and constrained optimization.

Limitations and Future Extensions

Data Limitations

The case-study dataset contains daily prices only. No volume, liquidity, market cap, sector, or fundamental data is available.

Risk Model Layer

A production version should add Barra-style or ML factor-risk controls for style, sector, market, liquidity, and concentration exposures.

Validation

Longer OOS periods, more market regimes, capacity analysis, and externally defined investable benchmarks would be needed before real-money deployment.