quietann

just got through a gnarly data-selection and sorting problem... I had two data sets from the same people. Data set 1 had one observation per person, while dataset 2 had anywhere from one to three observations. I was trying to retain the correct observation (the one with all the data) in a new dataset and get rid of duplicates... but SAS is dumb (or I am dumb, or both) and it kept getting screwed up.

Finally I realized that I needed to deal with the people who had duplicates separately. So here are the steps:

Merge the two original datasets, but the merged data go to one temporary dataset if there is a single observation for an ID, but another temporary dataset if there are >1 observations per ID. (Also get rid of any observations for people who are only in the second original data set.)
In the dataset with >1 observation, retain those where some index variable matches for both original datasets. (in this case, "visit=dietvisit")

Append the new new dataset with the single-observation temporary dataset.

Code:

/* now add dietary iron values where they are available */
data new dups;
	merge fefin (in=g) diet;
	by alive_id;
	if not g then delete;		/* delete people who 
                                        only have diet data */
	if first.alive_id=last.alive_id then output new;	

	else output dups;		/* this is where people 
                                        w/ >1 observation go */
run;

/* if >1 visit then we want the one where dietary iron was collected */

data nodups;
	set dups;
	if (visit ^= '' & dietvisit ^='' & visit=dietvisit);		
run;

/* append the correct duplicate observations to the new data set */

proc append base=new data=nodups;
run;

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

my brain hurts!

my brain hurts!

Profile

March 2026

Most Popular Tags

Style Credit

Expand Cut Tags