View: project_forecast_202409
Enhancements:
- Inclusion of the following fields from
ifs.DIM_PROJ_FORECAST_OL
:Forecast.FORECAST_VERSION_ID
Forecast.VERSION_DESCRIPTION
Forecast.FORECAST_STATUS
Spark-SQL Compatibility and Safe-Code Updates
Changes Implemented
To improve compatibility with Spark SQL and prepare views for future transition into Fabric or Lakehouse environments, the following adjustments have been made:
- ✅ Square Brackets Removed
Removed the use of[square_brackets]
from object references. Spark SQL does not support square brackets and instead uses backticks (`
) for quoting, though best practice is to avoid reserved keywords entirely where possible. - ✅ Dollar Signs Removed from Table References (Bronze)
Applied a consistent naming convention across all Bronze-layer objects to ensure compatibility with Spark-based engines and allow SQL parsing to succeed in Fabric or Delta Lake. Replaced$
suffixes in legacy table names with a standardisedQ
prefix to maintain source traceability while avoiding Spark SQL syntax errors. For example:DIM_PROJECT_OL$
→QDIM_PROJECT_OL
FACT_CUSTOMER_ORDER_LINE_OL$
→QFACT_CUSTOMER_ORDER_LINE_OL
FACT_BUSINESS_OPP_OL$
→QFACT_BUSINESS_OPP_OL
DIM_PROC_TEMPLATE_STAGE_OL$
→QDIM_PROC_TEMPLATE_STAGE_OL
- Removed
$
suffixes from ALL other QFACT and QDIM tables
- ✅
ISNULL()
Replaced withCOALESCE()
Standardised all null-handling logic to useCOALESCE()
, which is ANSI-compliant and supported by both SQL Server and Spark SQL. - ✅ Quote-Safe Descriptions Implemented
IntroducedREPLACE(text,"'","^")
logic within the SQL code generator to escape single quotes within string literals—particularly inDESCRIPTION
fields—to avoid SQL injection risks and parsing errors.
Outstanding Compatibility Issues
1. IIF()
in object CODE
should be CASE
- Issue:
IIF()
is a SQL Server–specific shorthand for simple conditional logic. Spark SQL (and ANSI SQL) does not support it. - Insight: Convert
IIF(condition, true_value, false_value)
to aCASE WHEN
block. - Suggested Action: As part of your metadata-driven SQL generator, introduce a conversion step:
IIF(A = B, 1, 0) → CASE WHEN A = B THEN 1 ELSE 0 END
- Risk: Negligible, assuming generator logic can apply consistent rewriting.
2. Square brackets on reserved words (e.g. [identifier]
)
- Issue: Square brackets (
[]
) are not valid in Spark SQL. Spark uses backticks (`
) for escaping reserved keywords. - Insight: Reserved words such as
identifier
,timestamp
,group
, etc., should be escaped using backticks if necessary. - Suggested Action: Replace
[identifier]
with`identifier`
, or better, alias to avoid escaping. - Note: Consider adding a reserved word dictionary and apply escaping or aliasing systematically.
3. CF$
field prefixes remain
- Issue: Spark interprets
$
in column names literally and often fails to parse such fields, especially in Delta or Hive environments. - Insight: Either rename in source metadata using
AS CF_FORECAST_VALUE
or access via backticks`CF$FIELDNAME`
. - Suggested Action: Add an alias rule in the view layer to strip or rename
CF$
columns. - Long-term: If feasible, clean naming at ingestion stage.
4. Use of CROSS APPLY
for unpivot logic
- Issue:
CROSS APPLY
is a T-SQL construct and unsupported in Spark SQL. - Insight: Often used to simulate row expansion (e.g. JSON or table-valued functions).
- Alternatives:
- Use
LATERAL VIEW
withexplode()
for arrays. - Use inline
SELECT
withUNION ALL
for fixed-column unpivoting.
- Use
- Challenge: If your unpivot relies on UDFs or metadata-based expressions, a full rewrite may be required.
- Recommendation: Flag CROSS APPLY usage as a structural blocker and isolate in a compatibility audit.
5. Use of CHARINDEX()
- Issue:
CHARINDEX()
is not supported in Spark; useinstr()
instead. - Example:
CHARINDEX('ABC', col) → instr(col, 'ABC')
- Caveat: The two behave differently in some edge cases (e.g. null-handling).
- Suggested Action: Add a search function shim to auto-convert when emitting Spark SQL.
6. STRING_AGG()
unsupported until Spark 4.0
- Issue: No native aggregation for string concatenation in Spark < v4.0.
- Workaround Options:
- Use
collect_list()
+array_join()
:array_join(collect_list(col), ', ')
- Use Python or Scala UDFs in notebooks, though not portable to SQL-only contexts.
- Use
- Recommendation: Isolate views needing this and defer porting until your environment upgrades to Spark 4.
🛠 Example (Future-Proofing Idea)
To make this portable and repeatable:
SELECT
'{' +
CASE WHEN code IS NOT NULL THEN '"'+ attribute + '":' + COALESCE(value_metadata, '{}') ELSE '' END +
'}' AS meta_data
FROM metadata_template
Run that with fixed inputs and UNION the rows — all without STRING_AGG
.
7. Use of GETDATE()
as a runtime substitution
- Issue:
GETDATE()
is not Spark SQL compliant. Equivalent:current_timestamp()
. - Insight: If
GETDATE()
is used as a dynamic substitution placeholder in generator logic (e.g. for snapshotting), ensure it’s replaced before SQL compilation. - Suggested Action: Replace all
GETDATE()
withcurrent_timestamp()
in Spark-safe code paths.
8. REPLACE()
required for quotes in CODE
fields
- Issue: Embedded quotes (
'
) or double quotes ("
) in data can cause SQL syntax errors when constructing literal strings. - Insight: Spark SQL does support
REPLACE()
, so this is more of a hygiene step than a compatibility issue. - Best Practice: Always escape embedded quotes using:
REPLACE(col, "'", "''")
- Recommendation: Keep this as a pre-step in your code generator and encode as part of the literal quoting logic.