Avoid Reporting on Duplicate Records in Birst After Importing

If you bring your data into Birst from a data source that contains multiple versions or variations of one record, you should be aware that Birst may be duplicating or triplicating the records from the source. A Birst Admin may only see one record after importing the source and processing the space because the records are identical, except for having typically one field that has different values. Depending on the data source, this field could be labeled "last modified", "version", or "variation number". After the data is processed in Birst, these rows can aggregate their measure values together into one record, which can lead to duplicates or triplicated measure (Fact) values.  

untitled.png

As consultants at Eigen X and a Birst Partner, we have faced this issue during implementations with our team and needed to comb through Infor M3A Documentation and pre-built Birst provisioned scripts to find the answer. This was most recently done in a project that involved switching a client from their SAP ERP and BW solutions to Infor Products. If you are facing duplication issues or inaccurate measure data, you are not alone!  

 Terms Used in the Blog: 

BQL: Birst’s version of SQL Language  

Primary Key: The field or fields whose value uniquely identifies a single record 

Raw Data Source: New table loaded into Birst (Could be from a Data Lake, Data Warehouse, csv file, etc) 

Scripted Source: New table in Birst created from one or more raw source tables within Birst where table joins, BQL logic, and other transformations exist (The “Select Statement” Area and the “Script” Area Make up a “Scripted Source”) 

Select Statement: Area in the scripted source where data is pulled/queried from a raw data source (s) or another scripted source (Table joins can also be created in this area of the Birst Scripted Source) 

Script: Area in the scripted source where BQL logic and transformations occur. This area uses the Select Statement’s queried results and executes the logic and transformations to create a Scripted Source 

 Problem Scenario: 

  • Three identical Item records exist in a Data Lake because they are a different version of the same record (Same Primary Key Values for All Versions). The "versioning" field is he only field that allows you to view the most current version of the Item record 

  • That Item Number is edited in an ERP system three times and the Data Lake has all three edited versions of that "Unique" Item Number because of a Real time or batch integration 

  • The Data Lake stores the Last Modified (date/time) and variation number field, to create different versions of the same record (Not an Upsert or Update to the existing record in the Data Lake) 

  • Birst's Connect tool can not always handle complex SQL queries to select/extract only the last modified Item record, so a Birst Admin will typically have to pull in all of the records into Birst from a certain date or use a variable.  

  • This would result in bringing more data in to Birst than is needed 

  • The Data is then brought into Birst for Modeling and front-end user consumption. 

  • This Raw Data Source now has all versions of that record in Birst 

  • Once the scripted source is modeled and processed, Birst aggregates the measure field, Quantity, of each of those records and creates one single Item record for the front-end user in Visualizer or Designer Reporting areas 

  • The single Item Number record will now have the Quantity measure field value triplicated 

  • This aggregation of this measure data occurs because of the specific Primary Key Value defined and used when targeting the grain and Hierarchy in the Birst Relate area 

  • Birst assumes that all the records are the same because they have the same Primary Key values and groups the sums of each Quantity value together 

  • This creates inaccurate Item measure data!!!! 

Proposed Solution: 

  • Use a Scripted Source to transform the original Raw Data Source table, which holds all versions of the Item record. 

  • This new Scripted Source would only contain the last modified/updated Item Number record 

  • This provides the right Quantity measure data and most up to date record details 

 Solution to Account for Record Versioning: 

  • Create a Scripted Source from the Raw Data Source that was initially imported/uploaded into Birst 

 

  • In the Select Statement area of the script, order the primary key fields that are descending to provide the last modified or desired version of the record (Done using "Order By" Statement): 

  • We will use the ItemMaster Raw Data Source for an example to show how the Scripted Source pulls in data when using the Select Statement. 

  • It is assumed that this ItemMaster table is stored in a Data Lake (Source) and then brought into Birst to become a Raw Data Source 

 

Select Statement Birst Screen Shot and Syntax: 

 SELECT [ItemMaster.ItemNumber], [ItemMaster.ItemName], [ItemMaster. CompanyNumber], [ItemMaster.ItemDescription], [ItemMaster.DateAdded] [ItemMaster.variationNumber] FROM [ItemMaster] 

 

// Order only the primary Key Values descending  

ORDER BY [ItemMaster.ItemNumber], [ItemMaster.ItemName], [ItemMaster.varaiationNumber] DESC 

 

  • This “Order By” will sort the numerical records in descending order, so that we can begin to reference the most recent record coming from the Raw Data Source (Most Business Use Cases tend to care about the most recent version and not “snapshot” data) 

 

  • In the Script, define the primary key values of that table to be referenced in the Order By section of your Select Statement above: 

 

Script Screen Shot and Script Syntax: 

 

// Variable (s) to hold the primary key value (s) 

// Keys: ItemNumber, ItemName 

Dim [vItemNumber] AS integer = NULL 

Dim [vItemName] AS integer = NULL 

 

  • Creating a Dimension (Dim) and setting each equal to “Null”, will make sure that no value is placed in that Dimension until we need it to be placed later in the script 

 

  • In the Script, insert only a single record using an If statement: 

 

Script Screen Shot and Script Syntax: 

 

// Insert the record with the highest variation number, lastmodified record, or version # If [vItemNumber]<> [ItemMaster.ItemNumber] Or [vItemName]<> [ItemMaster.ItemName] 

Then 

[ItemNumber]= [ItemMaster.ItemNumber], [ItemName]= [ItemMaster.ItemName], 

[varaiationNumber]=[ItemMaster.varaiationNumber] WRITERECORD 

 

  • Using the If statement logic, we will now fill those previously created dimensions with only a single record from the Select Statement  

  • This will fill the dimension with the most recent Item record version because we sorted the key fields in descending fashion in the Select Statement 

  • The “Then” part of the script will then allow the user to set the Select Statement field names equal to the Column names that will be used by Front end users and used going forward. (Example: “Item Number” should show as “Item ID” for business users in Visualizer) 

  • Since data is not automatically written to the scripted source that we are creating, we need to use the command “WRITERECORD” to update the scripted source with our changes. 

  • When we started the script, we declared/initialized our variable names as dimensions 

  • These declared variables hold their values throughout the execution of the script 

  • We update the dimension values as the Select Statement input records are processed 

 

  • Detailed Example: 

  • If we have the lines below in our data source. We only want to keep the unique combination of ItemNumber and ItemName with the highest VariationNumber (Lines 1 & 3) 

 

 

  • In the above example, the script first looks at line 1 and applies the logic of the IF statement: 

  • vItemNumber = Null (That is what we initially defined it as) 

  • ItemMaster.ItemNumber = W1 

  • ItemMaster.ItemName = WidgetBlue 

  • Since Null <> W1, Or Null <> WidgetBlue, the If statement is True 

 

  • It then writes the record:  

 

  • Next, the script looks at line 2 

  • vItemNumber now = W1 since we wrote the record above 

  • ItemMaster. ItemNumber = W1 

  • Since W1 = W1, the first part of the If statement is False  

  • vItemName = WidgetBlue since we wrote the record above 

  • ItemMaster.ItemName = WidgetBlue 

  • Since WidgetBlue = WidgetBlue, the second part of the If statement is also False 

  • Since the If statement is False, the record is not written and the variables remain the same as they are in Line 1 

  • When the script looks at line 3, the result will be the same as for line 2 and the record will not be written 

  • When the script looks at line 4, the vItemName (WidgetBlue) will not match the ItemMaster.ItemName (WidgetRed). This means the If statement will be true and the record will be written: 

 

  • vItemNumber will still = W1, but now vItemName will = WidgetRed 

  • When the script looks at the next line (line 5), the If statement will be False, and the record will not be written 

 

 

  • After writing the records in the script, we then set the variables to the records that we inserted after we wrote the script: 

 

Script Screen Shot and Script Syntax: 

 

// Set the Primary key variables to the inserted record [vItemNumber]=[ItemMaster.ItemNumber] [vItemName]=[ItemMaster.ItemName] End If 

 

  • Process the script 

 

 

InConclusion: 

  • Visualizer access should not show duplicated or triplicated amounts after working through steps 1-6 above. 

  • Example withoutVariation syntax in the scripted source 

 

  • Example withVariation syntax in the scripted source 

 

 

Please Reach Out with Any Questions 

Writer/Blog Contributor: 

Jason Norat – Birst SME 

Technical Consultant at Eigen X, LLC 

Email:jnorat@eigenx.com 

 

Key Blog Contributors: 

Rachel Cinelli – Analytics Expert  

John Swann – Data Architect