Locate duplicate values (or the opposite) in a table

The way X++ handles SQL statement often lacks a bit compared to what you can do in standard TSQL.

One of the areas is if you want to get a set of records from a table where a specific value in a field only occurs once in the table. You could traverse through the records and compare values as you go along. That could pull down performance significantly depending on the record count. Another way of doing it is to group by the field while counting the records in e.g. RecId. Then traversing this result set and look at the value in RecId could give you either the unique or not unique value depending on what you need.

A better way would be to let the SQL server do some of the work and consequently return as few records as possible. Here is a job that illustrates this and the group by version mentioned above.

AX 2012 queries support the HAVING clause. That can in some scenarios do the same and a bit more elegant than this.

 

static void DEMO_NoDuplicates(Args _args)
{
    CustGroup custGroup;
    CustGroup custGroup2;
    PaymTermId lastPaymTermId;

    info("TRAVERSE COUNT");

    while select count(recid) from custGroup
        group by PaymTermId
    {
        if (custGroup.RecId == 1)
        {
            while select custGroup2
                where custGroup2.PaymTermId == custGroup.PaymTermId
            {
                info(custGroup2.PaymTermId);
            }
        }
    }

    info("USE JOIN");
    
    while select custGroup
        order by PaymTermId
        notexists join custGroup2
            where custGroup.RecId != custGroup2.RecId
               && custGroup.PaymTermId == custGroup2.PaymTermId
    {
        info(custGroup.PaymTermId);
    }
}


Advertisements

Joins, indexes and an example of when they don’t match…

We experienced some data locks that gave a customer some performance issues. Going through the motions we found this statement (scrambled) being the problem:

update_recordSet adjustment
    setting Voucher = voucher
    where ! adjustment.voucher
        join trans
            where trans.Status == Status::Open
               && trans.DateTime < perDateTime
               && trans.RecId == adjustment.Trans;

The table Adjustment is connected to the table Trans through a relation like this: Adjustment.Trans = Trans.RecId. And Adjustment has – among others – an unclustered index like this: Trans, Voucher and a couple of other fields.

So you might think that the SQL server was capable of utilising this index since both Trans and Voucher are in play in the attempt to limit the records involved.

Looking at it from the SQL server it ends up like this:

(@P1 NVARCHAR(21), @P2 INT, @P3 BIGINT, @P4 NVARCHAR(5), @P5 NVARCHAR(21), @P6 BIGINT, @P7 NVARCHAR(5), @P8 INT, @P9 DATETIME2) UPDATE T1
SET VOUCHER = @P1, RECVERSION = @P2
FROM ADJUSTMENT T1
CROSS JOIN TRANS T2
WHERE (((T1.PARTITION = @P3)
AND (T1.DATAAREAID = @P4))
AND ((T1.VOUCHER = @P5)))
AND (((T2.PARTITION = @P6)
AND (T2.DATAAREAID = @P7))
AND (((T2.STATUS = @P8)
AND (T2.DATETIME < @P9))
AND (T2.RECID = T1.TRANS)))

Now, when executing this ended up giving an index scan resulting in heavy locking of data. The reason for this – and the reason why the index could not be used – is that the SQL server sees this as two statements selecting adjustment records with the Voucher field as only range and the trans records with the specified ranges except the relation range and then returns the intersection of these two result sets.

Adding an index with Voucher as first field solves the problem and the data locking stops.

Using HAVING in a query

With AX 2012 we now have the option of using HAVING in a query. This allows us to limit a result set based on aggregated fields.

The advantage is that we let the SQL server do some filtering and receive less records compared to the old-school version were we had to receive all records and then use an IF or the likes to filter away the records that did not match the criterias.

In this simple example we want to get all sales ids on orders having lines with a line amount total of 100000 and beyond.

 

static void SimpleQueryWithHavingExample(Args _args)
{
    Query query;
    QueryRun queryRun;
    QueryBuildDataSource qBDS_SalesTable;
    QueryBuildDataSource qBDS_SalesLine;
    SalesTable salesTable;
    SalesLine salesLine;
 
    // Init query
    query = new Query();
 
    // Add datasources and use standard relations
    qBDS_SalesTable = query.addDataSource(tableNum(SalesTable));
    qBDS_SalesLine = qBDS_SalesTable.addDataSource(tableNum(SalesLine));
    qBDS_SalesLine.relations(true);
 
    // Add a group by on SalesTable
    qBDS_SalesTable.addGroupByField(fieldNum(SalesTable, SalesId));

    // Add aggregation on LineAmount
    qBDS_SalesLine.addSelectionField(fieldNum(SalesLine, LineAmount), SelectionField::Sum);
 
    // Add the having filter
    query.addHavingFilter(qBDS_SalesLine, fieldStr(SalesLine, LineAmount), AggregateFunction::Sum).value(SysQuery::range(100000, ''));
 
    // Create and run the queryRun object
    queryRun = new QueryRun(query);

    while (queryRun.next())
    {
        salesTable = queryRun.get(tablenum(SalesTable));
        salesLine = queryRun.get(tableNum(salesLine));
        info(strFmt("%1 %2", salesTable.SalesId, salesLine.LineAmount));
    }
}

 

I am sure that this will come in handy. 🙂

 

EXISTS JOIN/NOT EXISTS JOIN versus NOT EXISTS JOIN/EXISTS JOIN

Not the prettiest headline so far. I do admit that. Nevertheless it is actually quite relevant. Here is the question to be answered: If I have multiple exists joins in my select statement and one of them is a not exists how can I be certain that I will get the expected result set back from the SQL server.

Let us start off by lining up 3 tables:

Trans

GroupId SubGroupId
A 123
B 456
C 789
D 987

 

GroupTable

GroupId
A
D
G

 

SubGroupTable

SubGroupId
123
456

 

The structure should be fairly obvious. Trans has 2 fields that refers to the sub tables. What I would like is to write a select statement that gives me all records from Trans where the GroupId isn’t present in GroupTable and where the SubGroupId is found in SubGroupTable.

This could be done like this:

select trans
    exists join subGroupTable.SubGroupId
        where subGroupTable.SubGroupId == trans.SubGroupId
            not exists join grouptable
                where grouptable.GroupId == trans.GroupId

Another way could be by turning it a bit upside down like this:

select trans
    not exists join grouptable
        where grouptable.GroupId == trans.GroupId
            exists join subGroupTable.SubGroupId
                where subGroupTable.SubGroupId == trans.SubGroupId

What happens when we look at what the SQL server receives as the TSQL-statement is a bit different than expected:

select GroupId, SubGroupId
    from Trans as t
        where EXISTS (SELECT 'x'
            FROM GroupTable as g
                where g.GroupId = t.GroupId
                    and not (exists (select 'x'
                    from SubGroupTable as s
                        where s.SubGroupId = t.SubGroupId)))

and

select GroupId, SubGroupId
    from Trans as t
        where not (EXISTS (SELECT 'x'
            FROM SubGroupTable as s
                where s.SubGroupId = t.SubGroupId
                    and (exists (select 'x'
                    from GroupTable as g
                        where g.GroupId = t.GroupId))))

 

What to notice is that in the first statement the EXISTS and NOT EXISTS are considered as one and in the second statement we now have 2 EXISTS that is   encapsulated by a NOT giving it a bit of a twist compared to what we might have expected.

The result set for the first statement is

D 987

since

A is existing in SubGroupTable which was ruled out by our NOT EXISTS

B is in SubGroupTable and not in GroupTable which both goes against our EXISTS/NOT EXISTS

C is in SubGroupTable A was

D is present in GroupTable and not in SubGroupTable

 

and for the second the result set is

B 456

C 789

D 987

since

A is in both GroupTable and SubGroupTable and we specified by having two EXISTS’s and adding these to a NOT that this was a no go.

B is returned although being in SubGroupTable it was not in GroupTable making the double EXISTS false and thereby pleasing the NOT

C is not present in neither GroupTable nor SubGroupTable and therefore matching the NOT

D is – like B – returned since it only exists in GroupTable and not i SubGroupTable and the EXISTS’s then returning false

 

I realise that this can be a bit tricky to get the head around but nevertheless very important to keep in mind when doing those large selects with lots of tables and EXISTS/NOT EXISTS joins.

The effect of unnecessary tables in a query

The other day I was asked to look at a performance issue with a form opening slow. After going through the normal motions I ended up with a method being called hundreds of time. It did not take that long time to execute but all in all it was the reason for the form slowing down.

What the method did was to call a view and summarise a quantity field. Not something extra ordinary but I gave the view a look anyhow. And that is what triggered me to do this small post on a simple but often effective question: Do I really need that table? 🙂

Let me show you what I found with a simple example. We want to have listed all purchase order ids, item ids, the line status and quantity sorted by purchase order id, line status and item id.

This is a simplified example of the real life example and the demo view looks like this:

SELECT TOP (100) PERCENT dbo.PURCHTABLE.PURCHID
	, dbo.PURCHLINE.ITEMID
	, dbo.PURCHLINE.PURCHSTATUS
	, dbo.PURCHLINE.QTYORDERED
FROM dbo.PURCHTABLE
INNER JOIN dbo.PURCHLINE
	ON dbo.PURCHTABLE.PURCHID = dbo.PURCHLINE.PURCHID
		AND dbo.PURCHTABLE.DATAAREAID = dbo.PURCHLINE.DATAAREAID
		AND dbo.PURCHTABLE.PARTITION = dbo.PURCHLINE.PARTITION
ORDER BY dbo.PURCHTABLE.PARTITION
	, dbo.PURCHTABLE.DATAAREAID
	, dbo.PURCHTABLE.PURCHID
	, dbo.PURCHLINE.PURCHSTATUS
	, dbo.PURCHLINE.ITEMID

It returns the data as expected, it executes in a few ms and everybody is happy. Well, until you execute it 500 times in a row and get tired of waiting. Looking at what data we actually need returned points us towards the PurchTable. The only thing we get from this table is the PurchId which is present at the PurchLine table as well and fully indexed there as well.

So if we change the view to look like this instead we get a significant effect:

SELECT TOP (100) PERCENT PURCHID
	, ITEMID
	, PURCHSTATUS
	, QTYORDERED
FROM dbo.PURCHLINE
ORDER BY PARTITION
	, DATAAREAID
	, PURCHID
	, PURCHSTATUS
	, ITEMID

The result set is the same but the performance has increased. Let us have a quick look at the statistics.

First what happens when executing the first view:

View1 - statistics

And then what happens with the updated view:

View2 - statistics

Well, several things indicates that we made a score. Packages received from the server has been reduced from 6 to 4 (33%) and bytes received reduced from 13921 to 5885 (58%). With this example only returning very few records (9) we will not see an effect in the execution times being dramatic, however we see an improvement.

All we did was to simplify and still got the exact same result at a much lower cost. This is not only relevant for views but any query made against the SQL server.

Selecting data with noFetch

Sometimes you would like to do different where clauses or sortings in while select depending on the circumstances and no matter what execute the same piece of business logic inside the while select.

One way of doing that is creating a method and use that within the loop. Another approach could be to use the noFetch keyword in the SELECT statement. The result is that the query is prepared and ready to be executed but the actual data load is not performed at that point. To traverse through the records you use the NEXT keyword. Below is an example listing the customer groups. Since the NEXT only refers to the first record in the SELECT you could get the idea that it would give some problems when joining tables in the SELECT. However, AX handles that as you would expect and returns the full data set.

    CustGroup   custGroup;
    TaxGroupHeading taxGroupHeading;    
    while select custGroup
        where custGroup.CustGroup like 'ic*'
        join taxGroupHeading
            where taxGroupHeading.taxgroup == custGroup.TaxGroupId
    {
        info(custGroup.CustGroup + " - " + taxGroupHeading.TaxGroupName);
    }    
    info('---');
    select noFetch custGroup
        where custGroup.CustGroup like 'ic*'
        join taxGroupHeading
            where taxGroupHeading.taxgroup == custGroup.TaxGroupId;
    next custGroup;
    while (custGroup.RecId)
    {
        info(custGroup.CustGroup + " - " + taxGroupHeading.TaxGroupName);
        next custGroup;
    }