This is a quick tip about syntax for handling row headers in U-SQL, the data processing language of Azure Data Lake Analytics. There are two components: handling row headers on the source data which is being queried, and row headers on the dataset being generated by ADLA.
Detecting that row headers are present on the first row of the source data:
USING Extractors.Csv(skipFirstNRows:1)
Outputting row headers on row 1 of the dataset being generated:
USING Outputters.Csv(outputHeader:true);
Here is a full U-SQL example which includes both:
DECLARE @inputPath string = "/RawData/{date:yyyy}/{date:MM}/{filename:*}.csv"; DECLARE @outputPath string = "/CuratedData/POC.csv"; @data = EXTRACT InstanceID string ,TransactionID string ,TimestampUtc string ,ClassID string ,ClassName string ,CurrentValue string ,date DateTime //virtual column ,filename string //virtual column FROM @inputPath USING Extractors.Csv(skipFirstNRows:1); @result = SELECT InstanceID AS InstanceID ,TransactionID AS TransactionID ,TimestampUtc AS TimestampUTC ,ClassID + "-" + ClassName AS Class ,CurrentValue AS CurrentValue ,date AS TransactionDate ,filename AS SourceFileName ,1 AS NbrOfTransactions FROM @data WHERE ClassName == "EX1"; OUTPUT @result TO @outputPath USING Outputters.Csv(outputHeader:true,quoting:true);
As a reminder: U-SQL is a batch-oriented language which requires its output to be written to a destination file. It's not intended to be an ad hoc query language at the time of this writing.
Want to Know More?
My next all-day workshop on Architecting a Data Lake is in Raleigh, NC on April 13, 2018.
You Might Also Like...
Data Lake Use Cases and Planning Considerations
Querying Multi-Structured JSON Files with U-SQL in Azure Data Lake