Eliminating duplicate records while preserving clean output and capturing the discarded duplicates can be efficiently achieved using the DFSORT or SyncSORT utility’s SUM FIELDS=NONE, XSUM feature. The SUM control statement is designed to aggregate numeric fields when records share a common sort key. However, by specifying FIELDS=NONE, numeric summation is bypassed, and only one unique instance of each key is retained. The XSUM parameter ensures that all discarded duplicates are written to the dataset defined by the SORTXSUM DD name.
SUM FIELDS=NONE Eliminates all but one record for each unique sort key.
XSUM Writes the removed duplicate records into the SORTXSUM DD dataset.
Internally, the SORT utility performs the following steps:
Sends all subsequent duplicate occurrences to SORTXSUM.
Sorts records based on the defined keys.
Keeps the first occurrence of each key in SORTOUT.
If the EQUALS option is in effect, the first record of summed records is kept.
If the NOEQUALS option is in effect, the retained record is unpredictable.
The ZDPRINT option enables printing of positive summed ZD values.
The NZDPRINT option prevents printing of positive summed ZD values.
The way DFSORT processes short SUM fields depends on whether the VLSHRT or NOVLSHRT option is used. A short field extends beyond the length of a variable-length record, making it important for accurate record handling during sorting or summing.
SORTXSUM: Output file for a SORT or MERGE function. It contains the records eliminated during SUM processing.
SUM FIELDS=(5,5,ZD,12,6,PD,21,3,PD,35,7,ZD)
SUM FORMAT=ZD,FIELDS=(5,5,12,6,PD,21,3,PD,35,7)
SUM FIELDS=(5,5,ZD,12,6,21,3,35,7,ZD),FORMAT=PD
Summary Field Formats
Length
Description
BI
2, 4, or 8 bytes
Unsigned binary
FI
2, 4, or 8 bytes
Signed fixed-point
FL
4, 8, or 16 bytes
Signed hexadecimal floating-point
PD
1 to 16 bytes
Signed packed decimal
ZD
1 to 31 bytes
Signed zoned decimal
XSUMExamples in JCL DFSORT
In the below example, the SORTXSUM file will contain the duplicate records based on the key starting at position 5 for 4 bytes. The unique record will be written to the SORTOUT file.
In this modified example, SORTXSUM will capture duplicate records where the key starts at position 5 for 4 bytes and where the value at position 45 is 'XYZ'. The unique record with value 'XYZ' at position 45 will be written to SORTOUT.
//SYSIN DD *
SORT FIELDS=(5,4,CH,A)
INCLUDE COND=(45,3,CH,EQ,C'XYZ')
SUM FIELDS=NONE,XSUM
/*
Using Multiple Fields as Key:
//SYSIN DD *
SORT FIELDS=(1,2,CH,A,4,3,CH,A,8,3,CH,A)
SUM FIELDS=NONE,XSUM
/*
This technique helps isolate duplicate records using composite keys spread across multiple fields and can be useful in reporting, deduplication, and validation tasks in mainframe environments.
With SUM FIELDS=NONE and EQUALS in effect, DFSORT eliminates “duplicate records” by writing the first record with each key to the SORTOUT data set and deleting subsequent records with each key. A competitive product offers an XSUM operand that allows the deleted duplicate records to be written to a SORTXSUM dataset. While DFSORT does not support the XSUM operand, DFSORT does provide the equivalent function and a lot more with the SELECT operator of ICETOOL. SELECT lets you put the records that are selected in the TO data set and the records that are not selected in the DISCARD data set. So an ICETOOL SELECT job to do the XSUM function might look like this:
//XSUM JOB ...
//DOIT EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//IN DD DSN=... input data set
//OUT DD DSN=... first record with each key
//SORTXSUM DD DSN=... subsequent records with each key
//TOOLIN DD *
SELECT FROM(IN) TO(OUT) ON(1,3,CH) FIRST DISCARD(SORTXSUM)
/*
This will put the first occurrence of each ON field (sort key) in the OUT data set and the rest of the records in the SORTXSUM data set.
If IN contained the following records:
J03 RECORD 1
M72 RECORD 1
M72 RECORD 2
J03 RECORD 2
A52 RECORD 1
M72 RECORD 3
OUT would contain the following records:
A52 RECORD 1
J03 RECORD 1
M72 RECORD 1
SORTXSUM would contain the following records:
J03 RECORD 2
M72 RECORD 2
M72 RECORD 3
SELECT also allows you to use multiple ON fields (that is, multiple keys), and DFSORT control statements (for example, INCLUDE or OMIT), such as in this SELECT statement:
//TOOLIN DD *
SELECT FROM(IN) TO(OUT1) ON(1,3,CH) ON(25,3,PD) FIRST -
DISCARD(XSUM1) USING(CTL1)
//CTL1CNTL DD *
INCLUDE COND=(11,7,CH,EQ,C'PET-RAT')
/*
And SELECT can do much more than that. Besides FIRST, it also lets you use:
FIRST(n)
FIRSTDUP
FIRSTDUP(n)
LAST
LASTDUP
ALLDUPS
NODUPS
HIGHER(x)
LOWER(y)
EQUAL(v)
You can use TO(outdd) alone, DISCARD(savedd) alone, or both, to manage selected and non-selected records for various use cases.
* Put duplicates in DUPS and non-duplicates in NODUPS:
Define Keys Precisely: Ensure your SORT FIELDS cover all bytes that constitute uniqueness.
Manage DISP & DCB: Let SORT supply DCB attributes when possible; override only if needed.
Validate Results: Always verify both SORTOUT and SORTXSUM for expected counts.
Performance: Use COUNT(1) with PRINT in a prior IDCAMS step if merely checking emptiness—not related to XSUM but useful in broader workflows.
Conclusion
Using SUM FIELDS=NONE, XSUM (SyncSORT) or ICETOOL SELECT…FIRST DISCARD (DFSORT) provides a straightforward, single-pass method to both remove duplicates from your main output and capture all dropped records separately. This dual-output approach simplifies auditing, downstream processing, and data quality checks in your batch JCL workflows.