Release 70 – knowhow

View: `project_forecast_202409`

Enhancements:

Inclusion of the following fields from ifs.DIM_PROJ_FORECAST_OL:
- Forecast.FORECAST_VERSION_ID
- Forecast.VERSION_DESCRIPTION
- Forecast.FORECAST_STATUS

Spark-SQL Compatibility and Safe-Code Updates

Changes Implemented

To improve compatibility with Spark SQL and prepare views for future transition into Fabric or Lakehouse environments, the following adjustments have been made:

✅ Square Brackets Removed
Removed the use of [square_brackets] from object references. Spark SQL does not support square brackets and instead uses backticks (`) for quoting, though best practice is to avoid reserved keywords entirely where possible.
✅ Dollar Signs Removed from Table References (Bronze)
Applied a consistent naming convention across all Bronze-layer objects to ensure compatibility with Spark-based engines and allow SQL parsing to succeed in Fabric or Delta Lake. Replaced $ suffixes in legacy table names with a standardised Q prefix to maintain source traceability while avoiding Spark SQL syntax errors. For example:
- DIM_PROJECT_OL$ → QDIM_PROJECT_OL
- FACT_CUSTOMER_ORDER_LINE_OL$ → QFACT_CUSTOMER_ORDER_LINE_OL
- FACT_BUSINESS_OPP_OL$ → QFACT_BUSINESS_OPP_OL
- DIM_PROC_TEMPLATE_STAGE_OL$ → QDIM_PROC_TEMPLATE_STAGE_OL
- Removed $ suffixes from ALL other QFACT and QDIM tables
✅ ISNULL() Replaced with COALESCE()
Standardised all null-handling logic to use COALESCE(), which is ANSI-compliant and supported by both SQL Server and Spark SQL.
✅ Quote-Safe Descriptions Implemented
Introduced REPLACE(text,"'","^") logic within the SQL code generator to escape single quotes within string literals—particularly in DESCRIPTION fields—to avoid SQL injection risks and parsing errors.

Outstanding Compatibility Issues

1. `IIF()` in object `CODE` should be `CASE`

Issue: IIF() is a SQL Server–specific shorthand for simple conditional logic. Spark SQL (and ANSI SQL) does not support it.
Insight: Convert IIF(condition, true_value, false_value) to a CASE WHEN block.
Suggested Action: As part of your metadata-driven SQL generator, introduce a conversion step:

IIF(A = B, 1, 0) → CASE WHEN A = B THEN 1 ELSE 0 END

Risk: Negligible, assuming generator logic can apply consistent rewriting.

2. Square brackets on reserved words (e.g. `[identifier]`)

Issue: Square brackets ([]) are not valid in Spark SQL. Spark uses backticks (`) for escaping reserved keywords.
Insight: Reserved words such as identifier, timestamp, group, etc., should be escaped using backticks if necessary.
Suggested Action: Replace [identifier] with `identifier`, or better, alias to avoid escaping.
Note: Consider adding a reserved word dictionary and apply escaping or aliasing systematically.

3. `CF$` field prefixes remain

Issue: Spark interprets $ in column names literally and often fails to parse such fields, especially in Delta or Hive environments.
Insight: Either rename in source metadata using AS CF_FORECAST_VALUE or access via backticks `CF$FIELDNAME`.
Suggested Action: Add an alias rule in the view layer to strip or rename CF$ columns.
Long-term: If feasible, clean naming at ingestion stage.

4. Use of `CROSS APPLY` for unpivot logic

Issue: CROSS APPLY is a T-SQL construct and unsupported in Spark SQL.
Insight: Often used to simulate row expansion (e.g. JSON or table-valued functions).
Alternatives:
- Use LATERAL VIEW with explode() for arrays.
- Use inline SELECT with UNION ALL for fixed-column unpivoting.
Challenge: If your unpivot relies on UDFs or metadata-based expressions, a full rewrite may be required.
Recommendation: Flag CROSS APPLY usage as a structural blocker and isolate in a compatibility audit.

5. Use of `CHARINDEX()`

Issue: CHARINDEX() is not supported in Spark; use instr() instead.
Example: CHARINDEX('ABC', col) → instr(col, 'ABC')
Caveat: The two behave differently in some edge cases (e.g. null-handling).
Suggested Action: Add a search function shim to auto-convert when emitting Spark SQL.

6. `STRING_AGG()` unsupported until Spark 4.0

Issue: No native aggregation for string concatenation in Spark < v4.0.
Workaround Options:
- Use collect_list() + array_join(): array_join(collect_list(col), ', ')
- Use Python or Scala UDFs in notebooks, though not portable to SQL-only contexts.
Recommendation: Isolate views needing this and defer porting until your environment upgrades to Spark 4.

🛠 Example (Future-Proofing Idea)

To make this portable and repeatable:

SELECT
    '{' +
    CASE WHEN code IS NOT NULL THEN '"'+ attribute + '":' + COALESCE(value_metadata, '{}') ELSE '' END +
    '}' AS meta_data
FROM metadata_template

Run that with fixed inputs and UNION the rows — all without STRING_AGG.

7. Use of `GETDATE()` as a runtime substitution

Issue: GETDATE() is not Spark SQL compliant. Equivalent: current_timestamp().
Insight: If GETDATE() is used as a dynamic substitution placeholder in generator logic (e.g. for snapshotting), ensure it’s replaced before SQL compilation.
Suggested Action: Replace all GETDATE() with current_timestamp() in Spark-safe code paths.

8. `REPLACE()` required for quotes in `CODE` fields

Issue: Embedded quotes (') or double quotes (") in data can cause SQL syntax errors when constructing literal strings.
Insight: Spark SQL does support REPLACE(), so this is more of a hygiene step than a compatibility issue.
Best Practice: Always escape embedded quotes using: REPLACE(col, "'", "''")
Recommendation: Keep this as a pre-step in your code generator and encode as part of the literal quoting logic.

Changes to Person

{"Current stable version":"202409","exec":"@version = ‘202409’","Development" : "@version = ‘beta’","Change":"Regional View"}

{"Current stable version":"202409","exec":"@version = ‘202409’","Development" : "@version = ‘beta’","Change":"Stable 09 View"}

Expected Condition The DataMart View BusOpp.Dates is in the style of 202409 As Found Condition The unversioned (beta) edition of…

Double check your WITH READ ONLY — it’s typically only used on views, not raw SELECT queries unless you're defining…

{"Current stable version":"202409","exec":"@version = ‘202409’","Development" : "@version = ‘beta’","Change":"Cross Apply version"}

View: project_forecast_202409

Spark-SQL Compatibility and Safe-Code Updates

Changes Implemented

Outstanding Compatibility Issues

1. IIF() in object CODE should be CASE

2. Square brackets on reserved words (e.g. [identifier])

3. CF$ field prefixes remain

4. Use of CROSS APPLY for unpivot logic

5. Use of CHARINDEX()

6. STRING_AGG() unsupported until Spark 4.0