Nov
11
2008

Add logging to the Data Profiling Task

Yesterday I was on a call where we looked at a few usability requests for the Data Profiling Task that didn't make it into RTM. One such request was task-specific logging. Surprise! The attentive developer at Microsoft Research had added it at the last moment, and few people had noticed. So this feature is not mentioned in BOL.

Enabling task-specific logging for the Data Profiling task

There's a DataProfilingTaskTrace logging event that's unique to this task that you want to enable, to capture task progress.

Sample logging output from the Data Profiling Task

Here's some sample output (truncated to 120 characters to prevent messy line wrapping) from a Quick Profile of the Person.Address table in AdventureWorks, using only the 1st 4 of the available Profile Requests. You can see how each request is processed one after another for each applicable column - for example, you see 8 ColumnNullRatioProfileRequests (1 for each of the 8 columns in the table), but only 3 ColumnStatisticsProfileRequests (since there are only 3 numeric or datetime columns). You can see how temp tables are used, see the queries that are issued, you see when each Profile Request is finished, and which type of request requires the most database activity.

Start Processing Requests: ColumnNullRatioProfileRequest(RequestID=NullRatioReq, DataSourceID={4E9A4BBB-6D27-4C83-BCFE-D
Finish Computing Request: ColumnNullRatioProfileRequest(RequestID=NullRatioReq, DataSourceID={4E9A4BBB-6D27-4C83-BCFE-D9
Finish Computing Request: ColumnNullRatioProfileRequest(RequestID=NullRatioReq, DataSourceID={4E9A4BBB-6D27-4C83-BCFE-D9
Finish Computing Request: ColumnNullRatioProfileRequest(RequestID=NullRatioReq, DataSourceID={4E9A4BBB-6D27-4C83-BCFE-D9
Finish Computing Request: ColumnNullRatioProfileRequest(RequestID=NullRatioReq, DataSourceID={4E9A4BBB-6D27-4C83-BCFE-D9
Finish Computing Request: ColumnNullRatioProfileRequest(RequestID=NullRatioReq, DataSourceID={4E9A4BBB-6D27-4C83-BCFE-D9
Finish Computing Request: ColumnNullRatioProfileRequest(RequestID=NullRatioReq, DataSourceID={4E9A4BBB-6D27-4C83-BCFE-D9
Finish Computing Request: ColumnNullRatioProfileRequest(RequestID=NullRatioReq, DataSourceID={4E9A4BBB-6D27-4C83-BCFE-D9
Query Start: select min([AddressID]), max([AddressID]), avg(cast([AddressID] as float)), stdevp(cast([AddressID] as floa
Query End: select min([AddressID]), max([AddressID]), avg(cast([AddressID] as float)), stdevp(cast([AddressID] as float)
Finish Computing Request: ColumnNullRatioProfileRequest(RequestID=NullRatioReq, DataSourceID={4E9A4BBB-6D27-4C83-BCFE-D9
Finish Computing Request: ColumnStatisticsProfileRequest(RequestID=StatisticsReq, DataSourceID={4E9A4BBB-6D27-4C83-BCFE-
Finish Computing Request: ColumnStatisticsProfileRequest(RequestID=StatisticsReq, DataSourceID={4E9A4BBB-6D27-4C83-BCFE-
Finish Computing Request: ColumnStatisticsProfileRequest(RequestID=StatisticsReq, DataSourceID={4E9A4BBB-6D27-4C83-BCFE-
Query Start: select LEN([AddressLine1]) as _COL0, LEN([AddressLine2]) as _COL1, LEN([City]) as _COL2, LEN([PostalCode])
Query End: select LEN([AddressLine1]) as _COL0, LEN([AddressLine2]) as _COL1, LEN([City]) as _COL2, LEN([PostalCode]) as
Query Start: select _COL0, count(*) from #DP_TEMP_2008_11_11_13_50_09_605a97f7fb0047c58c0a981638557981 group by _COL0
Query End: select _COL0, count(*) from #DP_TEMP_2008_11_11_13_50_09_605a97f7fb0047c58c0a981638557981 group by _COL0
Finish Computing Request: ColumnLengthDistributionProfileRequest(RequestID=LengthDistReq, DataSourceID={4E9A4BBB-6D27-4C
Query Start: select _COL1, count(*) from #DP_TEMP_2008_11_11_13_50_09_605a97f7fb0047c58c0a981638557981 group by _COL1
Query End: select _COL1, count(*) from #DP_TEMP_2008_11_11_13_50_09_605a97f7fb0047c58c0a981638557981 group by _COL1
Finish Computing Request: ColumnLengthDistributionProfileRequest(RequestID=LengthDistReq, DataSourceID={4E9A4BBB-6D27-4C
Query Start: select _COL2, count(*) from #DP_TEMP_2008_11_11_13_50_09_605a97f7fb0047c58c0a981638557981 group by _COL2
Query End: select _COL2, count(*) from #DP_TEMP_2008_11_11_13_50_09_605a97f7fb0047c58c0a981638557981 group by _COL2
Finish Computing Request: ColumnLengthDistributionProfileRequest(RequestID=LengthDistReq, DataSourceID={4E9A4BBB-6D27-4C
Query Start: select _COL3, count(*) from #DP_TEMP_2008_11_11_13_50_09_605a97f7fb0047c58c0a981638557981 group by _COL3
Query End: select _COL3, count(*) from #DP_TEMP_2008_11_11_13_50_09_605a97f7fb0047c58c0a981638557981 group by _COL3
Finish Computing Request: ColumnLengthDistributionProfileRequest(RequestID=LengthDistReq, DataSourceID={4E9A4BBB-6D27-4C
Query Start: drop table #DP_TEMP_2008_11_11_13_50_09_605a97f7fb0047c58c0a981638557981
Query End: drop table #DP_TEMP_2008_11_11_13_50_09_605a97f7fb0047c58c0a981638557981
Query Start: select * into [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_d90025ab96264190b60e32b3e38fba51] from (select [Addre
Query End: select * into [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_d90025ab96264190b60e32b3e38fba51] from (select [Address
Query Start: select [AddressID], [Count_2008_11_11_13_50_09_ed9e9a19aa0d4dbea24c9ff46296fe3a] from [tempdb]..[#DP_TEMP_2
Query End: select [AddressID], [Count_2008_11_11_13_50_09_ed9e9a19aa0d4dbea24c9ff46296fe3a] from [tempdb]..[#DP_TEMP_200
Query Start: drop table [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_d90025ab96264190b60e32b3e38fba51]
Query End: drop table [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_d90025ab96264190b60e32b3e38fba51]
Finish Computing Request: ColumnValueDistributionProfileRequest(RequestID=ValueDistReq, DataSourceID={4E9A4BBB-6D27-4C83
Query Start: select * into [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_3db7dbecb76f4d4490f5ceb6400d550a] from (select [Addre
Query End: select * into [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_3db7dbecb76f4d4490f5ceb6400d550a] from (select [Address
Query Start: select [AddressLine1], [Count_2008_11_11_13_50_09_ed9e9a19aa0d4dbea24c9ff46296fe3a] from [tempdb]..[#DP_TEM
Query End: select [AddressLine1], [Count_2008_11_11_13_50_09_ed9e9a19aa0d4dbea24c9ff46296fe3a] from [tempdb]..[#DP_TEMP_
Query Start: drop table [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_3db7dbecb76f4d4490f5ceb6400d550a]
Query End: drop table [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_3db7dbecb76f4d4490f5ceb6400d550a]
Finish Computing Request: ColumnValueDistributionProfileRequest(RequestID=ValueDistReq, DataSourceID={4E9A4BBB-6D27-4C83
Query Start: select * into [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_50feb58be00e4222ac4552689fc009c8] from (select [Addre
Query End: select * into [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_50feb58be00e4222ac4552689fc009c8] from (select [Address
Query Start: select [AddressLine2], [Count_2008_11_11_13_50_09_ed9e9a19aa0d4dbea24c9ff46296fe3a] from [tempdb]..[#DP_TEM
Query End: select [AddressLine2], [Count_2008_11_11_13_50_09_ed9e9a19aa0d4dbea24c9ff46296fe3a] from [tempdb]..[#DP_TEMP_
Query Start: drop table [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_50feb58be00e4222ac4552689fc009c8]
Query End: drop table [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_50feb58be00e4222ac4552689fc009c8]
Finish Computing Request: ColumnValueDistributionProfileRequest(RequestID=ValueDistReq, DataSourceID={4E9A4BBB-6D27-4C83
Query Start: select * into [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_1d184c4a525b45c296941a829cef7dfd] from (select [City]
Query End: select * into [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_1d184c4a525b45c296941a829cef7dfd] from (select [City],
Query Start: select [City], [Count_2008_11_11_13_50_09_ed9e9a19aa0d4dbea24c9ff46296fe3a] from [tempdb]..[#DP_TEMP_2008_1
Query End: select [City], [Count_2008_11_11_13_50_09_ed9e9a19aa0d4dbea24c9ff46296fe3a] from [tempdb]..[#DP_TEMP_2008_11_
Query Start: drop table [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_1d184c4a525b45c296941a829cef7dfd]
Query End: drop table [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_1d184c4a525b45c296941a829cef7dfd]
Finish Computing Request: ColumnValueDistributionProfileRequest(RequestID=ValueDistReq, DataSourceID={4E9A4BBB-6D27-4C83
Query Start: select * into [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_e16a6b96ef8d4c6594615b1a72506f4d] from (select [State
Query End: select * into [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_e16a6b96ef8d4c6594615b1a72506f4d] from (select [StatePr
Query Start: select [StateProvinceID], [Count_2008_11_11_13_50_09_ed9e9a19aa0d4dbea24c9ff46296fe3a] from [tempdb]..[#DP_
Query End: select [StateProvinceID], [Count_2008_11_11_13_50_09_ed9e9a19aa0d4dbea24c9ff46296fe3a] from [tempdb]..[#DP_TE
Query Start: drop table [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_e16a6b96ef8d4c6594615b1a72506f4d]
Query End: drop table [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_e16a6b96ef8d4c6594615b1a72506f4d]
Finish Computing Request: ColumnValueDistributionProfileRequest(RequestID=ValueDistReq, DataSourceID={4E9A4BBB-6D27-4C83
Query Start: select * into [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_c353c0c91f7b41fdbc72a731269e4635] from (select [Posta
Query End: select * into [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_c353c0c91f7b41fdbc72a731269e4635] from (select [PostalC
Query Start: select [PostalCode], [Count_2008_11_11_13_50_09_ed9e9a19aa0d4dbea24c9ff46296fe3a] from [tempdb]..[#DP_TEMP_
Query End: select [PostalCode], [Count_2008_11_11_13_50_09_ed9e9a19aa0d4dbea24c9ff46296fe3a] from [tempdb]..[#DP_TEMP_20
Query Start: drop table [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_c353c0c91f7b41fdbc72a731269e4635]
Query End: drop table [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_c353c0c91f7b41fdbc72a731269e4635]
Finish Computing Request: ColumnValueDistributionProfileRequest(RequestID=ValueDistReq, DataSourceID={4E9A4BBB-6D27-4C83
Query Start: select * into [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_604d40dbee234105a1bb409918961a0d] from (select [Modif
Query End: select * into [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_604d40dbee234105a1bb409918961a0d] from (select [Modifie
Query Start: select [ModifiedDate], [Count_2008_11_11_13_50_09_ed9e9a19aa0d4dbea24c9ff46296fe3a] from [tempdb]..[#DP_TEM
Query End: select [ModifiedDate], [Count_2008_11_11_13_50_09_ed9e9a19aa0d4dbea24c9ff46296fe3a] from [tempdb]..[#DP_TEMP_
Query Start: drop table [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_604d40dbee234105a1bb409918961a0d]
Query End: drop table [tempdb]..[#DP_TEMP_2008_11_11_13_50_09_604d40dbee234105a1bb409918961a0d]
Finish Computing Request: ColumnValueDistributionProfileRequest(RequestID=ValueDistReq, DataSourceID={4E9A4BBB-6D27-4C83

 I hope you find this feature useful, or at least fascinating. We'll be adding this information to the next release of Books Online.

-Doug

 

Pingbacks and trackbacks (1)+