Announcing Synergy/DE 12.2.1.1003 Feature Release for OpenVMS
December 17, 2022YAML Your Way to Rapid Deployment
December 19, 2022Overview
The Select class in Synergy DBL allows developers to access data from Synergy DBMS files using syntax and concepts that are closer to SQL than to the traditional file-based approach. Since its introduction, the Select class has grown to include more SQL-like features. With Synergy/DE 12, we’ve taken another step in that direction by introducing a new class that can be used with Select: GroupBy. This feature was initially introduced in Synergy/DE 12.1 but has been refined in the recent 12.2.1.1003 feature release.
The GroupBy class implements some of the functionality found in the GROUP BY and DISTINCT keywords in SQL. GroupBy enables you to filter results from a Select query so that similar records are treated as groups instead of individual records. It does this by letting you specify one or more fields, which it will use to sort the result set. Any set of records with the same value in those fields will be treated as a single group, and only one record from the group will be returned as a representative of the whole.
Basic Syntax The Synergy GroupBy class is derived from the OrderBy class, but the syntax is a bit different. Instead of using the Ascending and Descending methods, you can use a constructor and specify one or more fields to group by, similar to the Sparse class. For example
new GroupBy(firstname)
new GroupBy(lastname, firstname)
If you want to specify multiple fields, you can combine multiple GroupBy objects with the .AND. or && operators or just list multiple fields in the same call to Ascending() or Descending(). You can even combine both forms. For example:
grpobj1 = new GroupBy(firstname,lastname)
grpobj2 = new GroupBy(middleinitial)
grpobj3 = grpobj1.and.grpobj2
grpobj4 = new GroupBy(firstname,lastname) && (new GroupBy(middleinitial))
In any case, the leftmost field has the highest priority, and the rightmost field has the lowest priority. For grpobj3 and grpobj4 in the example above, records would be sorted first by firstname, then by lastname, and finally by middleinitial.
Ordering is an inherent concept in GroupBy: the returned groups will always be sorted by the specified fields in ascending order.
The constructor generates a GroupBy object, which can either be used inline in a Select constructor or assigned to a variable for later use. For instance:
selobj = new select(sprsobj, fromobj, new GroupBy(firstname,lastname))
or
grpobj = new GroupBy(firstname,lastname)
selobj = new select(sprsobj, fromobj, grpobj)
Whenever you use a GroupBy object in a Select query, you must also use a Sparse object. The reasons for this are described in the Sparse section below. For now, the examples will use “Sparse(“*”)”, which selects all fields. For example:
selobj = new select(new Sparse("*"), fromobj, new GroupBy(lastname))
Basic Example
To see this in action, consider the following example:
structure employee
key, d5
, a1
firstname, a15
, a1
lastname, a15
, a1
department, a15
, a1
salary, d7
endstructure
import Synergex.SynergyDE.Select
.main
record
sprsobj, @Sparse
fromobj, @From
selobj, @Select
rec1, employee
record
count, i4
salarysum, d8
salaryavg, d7
.define FILE "DAT:employees.ism"
.proc
xcall flags(7000000,1)
open(1, o, "TT:")
xcall isamc(FILE, ^size(rec1), 1,
& "START=1, LENGTH=5, NAME=""Key1"", DUPS, ASCEND, ATEND")
open(2, u:I, FILE)
store(2, "00001 Andrew McDonald Development 0025010")
store(2, "00002 Bob Brown HR 0030020")
store(2, "00003 Beth Jones HR 0035030")
store(2, "00004 Cathy Brown Sales 0040040")
store(2, "00005 David SMITH HR 0045050")
store(2, "00006 Fred Smith QA 0050060")
store(2, "00007 George Cartwright Shipping 0055070")
store(2, "00008 Henry Smith Development 0060080")
store(2, "00009 Jennifer Jones QA 0065090")
store(2, "00010 James Mcdonald Development 0070000")
store(2, "00011 John Cartwright Development 0075010")
store(2, "00012 Laura Jones QA 0080020")
store(2, "00013 Matthew Cartwright Shipping 0085030")
store(2, "00014 Phillip Brown Sales 0090040")
store(2, "00015 Rose Jones Sales 0095050")
close 2
open(2, I:I, FILE)
fromobj = new from(2, rec1)
writes(1, "")
writes(1, "OrderBy Department:")
writes(1, "KEY FIRSTNAME LASTNAME DEPARTMENT SALARY")
selobj = new select(fromobj, OrderBy.Ascending(rec1.department))
foreach rec1 in selobj
begin
writes(1,rec1)
end
writes(1, "")
writes(1, "GroupBy Department:")
writes(1, "KEY FIRSTNAME LASTNAME DEPARTMENT SALARY")
sprsobj = new Sparse("*")
selobj = new select(sprsobj, fromobj, new GroupBy(rec1.department))
foreach rec1 in selobj
begin
writes(1,rec1)
end
writes(1, "")
close 2
close 1
stop
.end
This program creates an ISAM file containing an example table of employees. It runs a Select query with OrderBy to retrieve the full table sorted by department:
OrderBy Department:
KEY FIRSTNAME LASTNAME DEPARTMENT SALARY
00001 Andrew McDonald Development 0025010
00008 Henry Smith Development 0060080
00010 James McDonald Development 0070000
00011 John Cartwright Development 0075010
00002 Bob Brown HR 0030020
00003 Beth Jones HR 0035030
00005 David Smith HR 0045050
00006 Fred Smith QA 0050060
00009 Jennifer Jones QA 0065090
00012 Laura Jones QA 0080020
00004 Cathy Brown Sales 0040040
00014 Phillip Brown Sales 0090040
00015 Rose Jones Sales 0095050
00007 George Cartwright Shipping 0055070
00013 Matthew Cartwright Shipping 0085030
It then runs the same query again, using GroupBy instead of OrderBy:
GroupBy Department:
KEY FIRSTNAME LASTNAME DEPARTMENT SALARY
00001 Andrew McDonald Development 0025010
00002 Bob Brown HR 0030020
00006 Fred Smith QA 0050060
00004 Cathy Brown Sales 0040040
00007 George Cartwright Shipping 0055070
In the second query, all of the records with a certain value for department are treated as a single group, and only one record is returned for each group.
Suppose we changed the OrderBy and GroupBy objects to use both rec1.department and rec1.lastname fields instead of just rec1.department:
selobj = new select(fromobj, OrderBy.Ascending(rec1.department, rec1.lastname))
selobj = new select(sprsobj, fromobj, new GroupBy(rec1.department, rec1.lastname))
The OrderBy query would return the same results as before, but in a slightly different order:
OrderBy Department and Lastname:
KEY FIRSTNAME LASTNAME DEPARTMENT SALARY
00011 John Cartwright Development 0075010
00001 Andrew McDonald Development 0025010
00010 James Mcdonald Development 0070000
00008 Henry Smith Development 0060080
00002 Bob Brown HR 0030020
00003 Beth Jones HR 0035030
00005 David SMITH HR 0045050
00009 Jennifer Jones QA 0065090
00012 Laura Jones QA 0080020
00006 Fred Smith QA 0050060
00004 Cathy Brown Sales 0040040
00014 Phillip Brown Sales 0090040
00015 Rose Jones Sales 0095050
00007 George Cartwright Shipping 0055070
00013 Matthew Cartwright Shipping 0085030
But the GroupBy query would return additional records that weren’t present in the original:
GroupBy Department and Lastname:
KEY FIRSTNAME LASTNAME DEPARTMENT SALARY
00011 John Cartwright Development 0075010
00001 Andrew McDonald Development 0025010
00010 James Mcdonald Development 0070000
00008 Henry Smith Development 0060080
00002 Bob Brown HR 0030020
00003 Beth Jones HR 0035030
00005 David SMITH HR 0045050
00009 Jennifer Jones QA 0065090
00006 Fred Smith QA 0050060
00004 Cathy Brown Sales 0040040
00015 Rose Jones Sales 0095050
00007 George Cartwright Shipping 0055070
This is because the GroupBy query is now using both fields to determine groups. Records in the new query will only be considered as part of the same group if they have identical values for both department and lastname, instead of just department. If multiple employee records in the same department have the same lastname value, they’ll still be part of the same group, but different last names will result in separate groups.
For the sake of simplicity, the examples here select all records from the data file. But GroupBy will also work with more complicated queries that include a Where expression.
Case Sensitivity
You may have noticed from the previous example that there appear to be two groups with a department value of “Development” and a lastname value of “McDonald”. This is because the default behavior for OrderBy and GroupBy is to use case-sensitive comparisons when sorting records. So “McDonald” and “Mcdonald” are distinct values and result in separate groups in GroupBy. This is easy to see if we group by lastname only:
GroupBy Lastname:
KEY FIRSTNAME LASTNAME DEPARTMENT SALARY
00002 Bob Brown HR 0030020
00007 George Cartwright Shipping 0055070
00003 Beth Jones HR 0035030
00001 Andrew McDonald Development 0025010
00010 James Mcdonald Development 0070000
00005 David SMITH HR 0045050
00006 Fred Smith QA 0050060
If you want to ignore differences in case, you can use the NoCaseGroupBy class instead of the regular GroupBy:
NoCaseGroupBy Lastname:
KEY FIRSTNAME LASTNAME DEPARTMENT SALARY\
00002 Bob Brown HR 0030020
00007 George Cartwright Shipping 0055070
00003 Beth Jones HR 0035030
00001 Andrew McDonald Development 0025010
00005 David SMITH HR 0045050
We’ve also added the same functionality to OrderBy in the new NoCaseOrderBy class. Note that NoCaseGroupBy and NoCaseOrderBy are only intended to be used with alpha fields and may cause problems if used with any other data type. If you want to use case-insensitive comparisons for some fields and not others, you can create multiple objects and combine them. For example:
new NoCaseGroupBy(alpha_field) && (new GroupBy(int_field))
Sparse
A Select query using GroupBy will return a record from the data file to represent each group as a whole. Because this is a regular record, we can expect it to contain data in all of its fields, not just the fields used to determine the group. But these fields will probably be meaningless for use in representing the group. We can see this by going back to the “GroupBy(rec1.department)” example:
GroupBy Department:
KEY FIRSTNAME LASTNAME DEPARTMENT SALARY
00001 Andrew McDonald Development 0025010
00002 Bob Brown HR 0030020
00006 Fred Smith QA 0050060
00004 Cathy Brown Sales 0040040
00007 George Cartwright Shipping 0055070
The only meaningful field in that data is the one used to determine the groups: department. The fact that the employee record returned for the Development department has a name of “Andrew McDonald” doesn’t necessarily tell us anything about the Development department. And the fact that the records for the HR and Sales departments both have a lastname value of “Brown” doesn’t necessarily indicate any similarity between those departments.
It would be helpful to limit the returned data to fields that actually matter. In a SQL query, this could be done by writing a query to only return the fields you care about. In Synergy’s Select class, the fields used by the returned data are determined in advance by the record definition. But you can use the Sparse class to specify the fields that you want returned, leaving all other fields empty. For example:
selobj = new select(new Sparse(rec1.department), fromobj,
& new GroupBy (rec1.department))
which results in
GroupBy Department (Sparse):
KEY FIRSTNAME LASTNAME DEPARTMENT SALARY
Development
HR
QA
Sales
Shipping
The primary use case for Sparse is to improve performance when accessing data over xfServer by reducing the amount of data sent over the network. Our documentation warns against using Sparse when you aren’t also using xfServer, as it may not improve performance. But in the case of GroupBy, Sparse can still be useful regardless of any performance impact.
Additionally, because GroupBy now uses a static cursor to sort and return results, Sparse may improve performance even for local files. A static cursor reads all matching records from a data file into memory before processing (e.g., sorting and grouping) and returning them. Using Sparse to reduce the number of returned fields will reduce the amount of data that needs to be kept in memory, which can reduce memory usage and improve performance.
For these reasons, a Sparse object is required for all queries that use a static cursor, such as all GroupBy queries. We strongly recommend using Sparse to limit a GroupBy query to fields that are actually meaningful, but it is still possible to include additional fields, or even to specify “Sparse(“*”)” and include all fields regardless of relevance.
Aggregate Values
While the non-group fields of any given GroupBy record may not be meaningful individually, it may be helpful to use these fields to describe a group as a whole. For instance, it may not be useful to see the salary of the employee record returned for the QA department, because the record might not be representative of the department. But if you could see the average salary for everyone in the department, or the total amount of money the department spends on payroll, that information could be useful.
This brings us to a primary use case of GROUP BY in SQL: aggregate values. In addition to the actual fields from a table, SQL allows you to select aggregate functions to be performed for all fields in a group, such as Sum, Count, Average, Minimum, and Maximum.
At present, the Synergy GroupBy class doesn’t have aggregate functions built in. But it’s possible to calculate aggregate values in code using multiple Select objects. The first Select object can use GroupBy to retrieve distinct values for specific fields. Then a second Select object can use those values to drive a Where expression and iterate over all records in that group. The code can use those records to calculate any aggregate values.
Advanced Example
To see an example of aggregate values, you can add this code after the existing Select objects in the test program:
writes(1, "GroupBy Department with aggregate values:")
writes(1, "DEPARTMENT COUNT AVG(SALARY) SUM(SALARY)")
selobj = new select(new Sparse(rec1.department), fromobj,
& new GroupBy(rec1.department))
foreach rec1 in selobj
begin
data selobj2, @Select
data rec2, employee
open(3, I:I, FILE)
count = 0
salarysum = 0
salaryavg = 0
selobj2 = new Select(new From(3,rec2),
& (Where)(rec2.department == rec1.department))
foreach rec2 in selobj2
begin
count += 1
salarysum += rec2.salary
end
salaryavg = salarysum/count
writes(1,rec1.department + " " + %string(count,"XX") +
& " " + %string(salaryavg,"$$$,$$$,$$$") +
& " " + %string(salarysum,"$$,$$$,$$$"))
close 3
end
After running this program, you should get output like this:
GroupBy Department with aggregate values:
DEPARTMENT COUNT AVG(SALARY) SUM(SALARY)
Development 04 $57,525 $230,100
HR 03 $36,700 $110,100
QA 03 $65,056 $195,170
Sales 03 $75,043 $225,130
Shipping 02 $70,050 $140,100
This code uses GroupBy to return individual records for each department. Within the outer FOREACH loop, another Select object retrieves all records that match each group and keeps a running total of the number of records and the salary for all records in the department. After it’s done reading the records, the code calculates the average salary for the department and displays the data to the screen.
A downside to this method is that there’s no built-in way to sort groups based on an aggregate value. The order of groups is based on the field used in the GroupBy, and any ordering based on aggregate values would need to be done after the initial Select is finished and all aggregate values are calculated.
Limitations
At the time of this writing, the current syntax for GroupBy (introduced in 12.2.1.1003) can only be used when targeting the latest runtime. If you’re building a Synergy program from the command line, this means you must include an option like “-qrntcompat=120301” in the compiler and linker commands. If you’re building from Visual Studio, you can select a Target Synergy runtime of “Latest” in the Build properties for the project. (The original syntax introduced in 12.1 that uses Ascending() and Descending() methods, similar to the OrderBy class, can be used when targeting the default 120101 runtime level, but that syntax is no longer recommended.)
GroupBy requires compiler and runtime features that aren’t present in earlier versions of Synergy. You won’t be able to compile code that uses GroupBy if you target an earlier Synergy version. (In other words, if you use the -qrntcompat compiler and linker option, you can’t specify a version number prior to 12010100.) Likewise, if you’re accessing data over xfServer, the server must be running at least version 12 for GroupBy to work. By default, an xfServer client of version 12.2.1.1003 or later can only connect to a server of version 12.2.1.1003 or later, although you can use the SRV_COMPAT environment variable to override this and connect to a server with an earlier version.
At present, GroupBy is not supported with a JoinSelect. Queries using GroupBy must operate on a single file only. Likewise, GroupBy and OrderBy are not supported on the same query, although results returned by GroupBy will be sorted in ascending order.
GroupBy uses a static cursor and so operates on a snapshot of the data file at the time when the enumerator is created. As a result, it’s possible for the underlying data to change between the time when the Select statement starts working and the time when it retrieves any given record. This shouldn’t change the order of any groups, but it could result in new groups being skipped if records belonging to that group are only added after the Select has started working.
Because each result returned by a GroupBy query represents a group of records instead of a single record in a file, several Select features that require individual records won’t work. For instance, the RestrictedAlphaEnumerator methods CurrentRFA and DeleteCurrent are incompatible with GroupBy. A GroupBy result will have no change tracking information, so the GetCTInfo property is also incompatible. Record locking is incompatible with a GroupBy query, and a GroupBy query won’t execute if locks are enabled in the From object, so the IsLocked property is incompatible as well. The SparseUpdate method is also incompatible, because it can’t be used on a query with a Sparse object, and Sparse objects are required when using GroupBy.
In some circumstances (e.g., if it’s sorting by a single key that’s already in the correct order in the ISAM file), OrderBy can optimize a query by bypassing the process of sorting records and returning them in the correct order directly from the file. GroupBy doesn’t support this but will always sort the result set regardless of what fields are specified. This may result in slower performance than would otherwise be possible for some queries.
As mentioned above, aggregate functions are not currently supported with GroupBy.
Future Plans
While future plans have not been finalized, we’re considering addressing many of the above limitations in future releases of Synergy/DE. One of the biggest improvements we’d like to make in the near future is to support GroupBy with JoinSelect, similar to the support we recently added to use OrderBy with JoinSelect. We’d also like to support OrderBy and GroupBy on the same queries.
In the longer term, we’d like to support aggregate functions with GroupBy (Sum, Count, Average, Minimum, and Maximum), bringing feature parity closer to SQL. Ideally, this feature would allow arbitrary mapping of grouped and aggregated fields from a source record into a destination record, with arbitrary sorting of the results. This mapping could also be useful in other contexts besides GroupBy.
Another potential change is to optimize GroupBy, perhaps by returning records directly from memory after sorting them, instead of reading records from the ISAM file when retrieving them. We might also have GroupBy imply Sparse, so results aren’t cluttered by meaningless records and performance over xfServer is improved, without requiring an explicit definition of a Sparse object.
These changes will likely require new versions of the compiler, runtime, and xfServer, so backwards compatibility is one limitation we wouldn’t be addressing.
Conclusion
I hope this article has given you a good introduction to the GroupBy class. You should have some idea of how the class can be used, both theoretically and with practical code examples. You should also have a basic understand of the differences between the Synergy GroupBy class and SQL, as well as the current limitations of the class, but also some insight into how the class may develop in the future.