Cheatography
https://cheatography.com
Sorting the data is always a resource-intensive operation. Therefore, using PROC SORT efficiently can save you both time and computing resources.
There are a number of options associated with PROC SORT that can be used not only to control the performance and capabilities of the procedure but also to the resulting data set.
This is a draft cheat sheet. It is a work in progress and is not finished yet.
NODUPREC
Removes duplicate observations that are adjacent after sorting. |
NODUPKEY
NODUPKEY option checks and eliminates observations with duplicate BY values keeping only the first occurrence in the BY group. |
NOUNIQUEKEY and UniqueOUT
The NOUNIQUEKEY option checks and eliminates observations from the output data set that has a unique sort key. |
The <strong>UNIQUEOUT</strong>= option can be used with the NOUNIQUEKEY option. UNIQUEOUT= SAS-data-set specifies the output data set for observations that will contain unique records. |
|
|
OVERWRITE
The OVERWRITE option will enable you to delete the input data set before the replacement output data set of the same name is populated with observations. |
Example
data class;
set sashelp.class;
run;
proc sort data=class overwrite;
by age;
run;
|
The OVERWRITE option has no effect when an OUT= data set is specified.
PRESORTED
The PRESORTED checks within the input data set to determine whether the sequence of observations is in order before sorting is done. |
Relative Order of Observations in Each BY Group
EQUALS |
NOEQUALS |
The EQUALS option specifies the order of the observations in the output data set and it maintains the relative order of the observations from within the input data set to the output data set for observations with identical BY variable values. |
NOEQUALS does not necessarily preserve this order in the output data set. |
SORTEQUAL System Option specifies that observations with identical BY variable values are to retain the same relative positions in the output data set as in the input data set. |
NOSORTEQUALS says that no resources should be used to control the order of observations in the output data set that have the same value for a BY variable. |
|
|
Sorting Orders
Numeric Variables |
For numeric variables, the smallest-to-largest comparison sequence is: |
1. SAS missing values (shown as a period or special missing value)
2. negative numeric values
3. zero
4. positive numeric values.
|