Thursday, October 1, 2009

Viva la Famiglia!

My wife comes from a large Italian family, and there’s a family reunion every year. With all those relatives, it’s hard to remember how who is related to whom. A woman will strike up a conversation with another woman and ask, “How are you related to me again?” And the second woman will say, “I’m not sure,” but what she's really thinking is, “I hope the risotto that you brought is better than it was last year.”

I’m German, so I want facts and details, and I want things organized just so! I used to know my wife’s family tree quite well, but the knowledge has faded over the years.

So why not put SQL Server to use in providing a tool to tell each person how they are related to every other person in the family? Crazy, right? Yeah, it is… but it was a fun exercise.

How people are related can be quite complicated when you go down through the generations. All those cousins (and second cousins and cousins once removed) and great nephews and great great grandmothers, to name a few, can be quite confusing. Here’s a chart that makes all the relationships a little more clear (click on it to see an enlarged view):

Family Relationship Chart

And adding spouses to the mix makes it twice as complicated!

But I was up for the challenge.

The code I wrote assumes that each person has only one spouse (no divorces or ex-spouses) and their spouse is always the opposite sex, and there are no stepchildren… hey, this is an Italian Catholic family, after all.

Let’s create and populate a table (in tempdb) with a sample family. You can see (below the code) a representation of how the family is laid out. For example, John (who married Mary) had 3 children (Fred, Linda, and Charles). They all got married and had children, and those children got married and had children, etc.

use tempdb;
go

if object_id('OurFamily') is not null
drop table OurFamily;
go

create table OurFamily
(
ID int primary key
,FirstName varchar(20)
,Gender char(1)
,ParentID int
,SpouseName varchar(20)
);
insert OurFamily
select 101,'John' ,'M',null,'Mary'
union all select 102,'Fred' ,'M', 101,'Wilma'
union all select 103,'Linda' ,'F', 101,'Mike'
union all select 104,'Charles' ,'M', 101,'Diana'
union all select 105,'Rebecca' ,'F', 102,'Marvin'
union all select 106,'Sam' ,'M', 102,'Tammy'
union all select 107,'Ben' ,'M', 102,'Abigail'
union all select 108,'Jack' ,'M', 102,'Jill'
union all select 109,'George' ,'M', 103,'Martha'
union all select 110,'Dorothy' ,'F', 104,'Paul'
union all select 111,'Tom' ,'M', 104,'Judy'
union all select 112,'Susan' ,'F', 104,'Andrew'
union all select 113,'Patrick' ,'M', 105,null
union all select 114,'Sharon' ,'F', 105,'Mark'
union all select 115,'Brian' ,'M', 106,null
union all select 116,'Jean' ,'F', 107,'Bob'
union all select 117,'Calvin' ,'M', 107,'Kate'
union all select 118,'Frank' ,'M', 109,'Jennifer'
union all select 119,'Joan' ,'F', 109,null
union all select 120,'Eric' ,'M', 109,'Elaine'
union all select 121,'Lisa' ,'F', 110,null
union all select 122,'James' ,'M', 111,'Cynthia'
union all select 123,'Isabel' ,'F', 111,'Greg'
union all select 124,'David' ,'M', 112,null
union all select 125,'Cindy' ,'F', 114,'Nigel'
union all select 126,'Alvin' ,'M', 114,'Lucy'
union all select 127,'Julie' ,'F', 116,null
union all select 128,'Tim' ,'M', 118,null
union all select 129,'Michelle','F', 120,'Ryan'
union all select 130,'Peter' ,'M', 120,null
union all select 131,'Ken' ,'M', 123,null
union all select 132,'Harry' ,'M', 125,null
union all select 133,'Nancy' ,'F', 129,null
;
/*
John+Mary
|--Fred+Wilma
| |--Rebecca+Marvin
| | |--Patrick
| | |--Sharon+Mark
| | |--Cindy+Nigel
| | |--Harry
| | |--Alvin+Lucy
| |--Sam+Tammy
| | |--Brian
| |--Ben+Abigail
| | |--Jean+Bob
| | | |--Julie
| | |--Calvin+Kate
| |--Jack+Jill
|--Linda+Mike
| |--George+Martha
| |--Frank+Jennifer
| | |--Tim
| |--Joan
| |--Eric+Elaine
| |--Michelle+Ryan
| |--Nancy
| |--Peter
|--Charles+Diana
|--Dorothy+Paul
| |--Lisa
|--Tom+Judy
| |--James+Cynthia
| |--Isabel+Greg
| |--Ken
|--Susan+Andrew
|--David
*/
The stored procedure that I created accepts a person’s name (for clarity’s sake, all the names of family members and spouses in the table are unique) and prints an alphabetical list of all people (spouses included) so they can have a cheat sheet reference list when they are at the reunion.

So, for example, let’s say that Eric (who is a family member) wants a list, which is generated below. He can meet Paul at the reunion and say, “Hey, you’re the Husband of my 1st Cousin Once Removed, Dorothy!”

exec usp_GetListOfRelatives 'Eric';
/*
RelationName Relation
------------ ----------------------------------------------
Abigail Wife of my 1st Cousin Once Removed, Ben
Alvin my 2nd Cousin Once Removed
Andrew Husband of my 1st Cousin Once Removed, Susan
Ben my 1st Cousin Once Removed
Bob Husband of my 2nd Cousin, Jean
Brian my 2nd Cousin
Calvin my 2nd Cousin
Charles my Great Uncle
Cindy my 2nd Cousin Once Removed
Cynthia Wife of my 2nd Cousin, James
David my 2nd Cousin
Diana my Great Aunt
Dorothy my 1st Cousin Once Removed
Elaine my Wife
Eric Myself
Frank my Brother
Fred my Great Uncle
George my Father
Greg Husband of my 2nd Cousin, Isabel
Harry my 2nd Cousin Twice Removed
Isabel my 2nd Cousin
Jack my 1st Cousin Once Removed
James my 2nd Cousin
Jean my 2nd Cousin
Jennifer my Sister-In-Law, married to my Brother, Frank
Jill Wife of my 1st Cousin Once Removed, Jack
Joan my Sister
John my Great Grandfather
Judy Wife of my 1st Cousin Once Removed, Tom
Julie my 2nd Cousin Once Removed
Kate Wife of my 2nd Cousin, Calvin
Ken my 2nd Cousin Once Removed
Linda my Grandmother
Lisa my 2nd Cousin
Lucy Wife of my 2nd Cousin Once Removed, Alvin
Mark Husband of my 2nd Cousin, Sharon
Martha my Mother
Marvin Husband of my 1st Cousin Once Removed, Rebecca
Mary my Great Grandmother
Michelle my Daughter
Mike my Grandfather
Nancy my Granddaughter
Nigel Husband of my 2nd Cousin Once Removed, Cindy
Patrick my 2nd Cousin
Paul Husband of my 1st Cousin Once Removed, Dorothy
Peter my Son
Rebecca my 1st Cousin Once Removed
Ryan my Son-In-Law, married to Michelle
Sam my 1st Cousin Once Removed
Sharon my 2nd Cousin
Susan my 1st Cousin Once Removed
Tammy Wife of my 1st Cousin Once Removed, Sam
Tim my Nephew
Tom my 1st Cousin Once Removed
Wilma my Great Aunt
*/
And Lucy, who is not a blood relative, but who is married to Alvin, would get this list for herself:

exec usp_GetListOfRelatives 'Lucy';
/*
RelationName Relation
------------ ---------------------------------------------------------
Abigail my Husband's Great Aunt
Alvin my Husband
Andrew Husband of my Husband's 1st Cousin Twice Removed, Susan
Ben my Husband's Great Uncle
Bob Husband of my Husband's 1st Cousin Once Removed, Jean
Brian my Husband's 1st Cousin Once Removed
Calvin my Husband's 1st Cousin Once Removed
Charles my Husband's Great Great Uncle
Cindy my Husband's Sister
Cynthia Wife of my Husband's 2nd Cousin Once Removed, James
David my Husband's 2nd Cousin Once Removed
Diana my Husband's Great Great Aunt
Dorothy my Husband's 1st Cousin Twice Removed
Elaine Wife of my Husband's 2nd Cousin Once Removed, Eric
Eric my Husband's 2nd Cousin Once Removed
Frank my Husband's 2nd Cousin Once Removed
Fred my Husband's Great Grandfather
George my Husband's 1st Cousin Twice Removed
Greg Husband of my Husband's 2nd Cousin Once Removed, Isabel
Harry my Husband's Nephew
Isabel my Husband's 2nd Cousin Once Removed
Jack my Husband's Great Uncle
James my Husband's 2nd Cousin Once Removed
Jean my Husband's 1st Cousin Once Removed
Jennifer Wife of my Husband's 2nd Cousin Once Removed, Frank
Jill my Husband's Great Aunt
Joan my Husband's 2nd Cousin Once Removed
John my Husband's Great Great Grandfather
Judy Wife of my Husband's 1st Cousin Twice Removed, Tom
Julie my Husband's 2nd Cousin
Kate Wife of my Husband's 1st Cousin Once Removed, Calvin
Ken my Husband's 3rd Cousin
Linda my Husband's Great Great Aunt
Lisa my Husband's 2nd Cousin Once Removed
Lucy Myself
Mark my Father-In-Law
Martha Wife of my Husband's 1st Cousin Twice Removed, George
Marvin my Husband's Grandfather
Mary my Husband's Great Great Grandmother
Michelle my Husband's 3rd Cousin
Mike my Husband's Great Great Uncle
Nancy my Husband's 3rd Cousin Once Removed
Nigel Husband of my Husband's Sister, Cindy
Patrick my Husband's Uncle
Paul Husband of my Husband's 1st Cousin Twice Removed, Dorothy
Peter my Husband's 3rd Cousin
Rebecca my Husband's Grandmother
Ryan Husband of my Husband's 3rd Cousin, Michelle
Sam my Husband's Great Uncle
Sharon my Mother-In-Law
Susan my Husband's 1st Cousin Twice Removed
Tammy my Husband's Great Aunt
Tim my Husband's 3rd Cousin
Tom my Husband's 1st Cousin Twice Removed
Wilma my Husband's Great Grandmother
*/
The code for the stored procedure is at the end of this article. I tried to do it in one gigantic query, but I couldn’t because SQL came back and told me that it was too complicated. So I split it into two queries, with the first one putting some intermediate data into a temporary table (#FamilyRelations) and the second query working off of that.

I’m not going to go into any great detail as to how I wrote the query. But the main steps in the first query are (1) Build a FamTree recursive CTE to include an Ancestry path, (2) For each person in the FamTree, go through his Ancestry path, ancestor-by-ancestor (via the JOIN to a Numbers table), (3) For each of those ancestors, link them to everyone else in FamTree with common ancestors (via the JOIN again to FamTree), and (4) Calculate relative positioning values (N and N2) between each person and those relatives.

The difference between N and N2 determines how a person is related, and corresponds to the relationship chart at the beginning of this article. For example, if N-N2 is zero, where someone and a relative share a common parent or grandparent or great grandparent, etc, then that has to do with the line going across the middle part of the chart (Myself, Brother/Sister, First Cousin, Second Cousin, etc). Part of what makes the query lengthy is using the Gender column to determine gender-specific relations, like Mother/Father, Brother/Sister, Uncle/Aunt, Nephew/Niece, etc.

All of this data (for the person in question) is put into the #FamilyRelations temp table.

Finally, the second query brings spouses into the mix, which is another can of worms, with In-Laws, etc.

Anyway, the code is done and ready to use. I’m looking forward to using it next year at the family reunion. I’ll have my laptop ready, and I’ll ask individual people if they want a list of everyone at the reunion and how they are related. They’ll say, “Sure! That sounds great!” But I’m sure they’ll really be thinking, “Sheesh, what a geek!” … (In Italian, of course).

if object_id('usp_GetListOfRelatives') is not null
drop procedure usp_GetListOfRelatives;
go

create procedure usp_GetListOfRelatives
(
@Who varchar(20)
)
as

declare
@IsBloodRelative bit;
set @IsBloodRelative=case
when exists (select * from OurFamily
where FirstName=@Who)
then 1 else 0 end;

if object_id('tempdb..#FamilyRelations') is not null
drop table #FamilyRelations;

with FamTree(ID,FirstName,Gender,SpouseName,Ancestry) as
(
select ID
,FirstName
,Gender
,SpouseName
,cast(str(ID,5) as varchar(max))
from OurFamily
where ParentID is null
union all
select t.ID
,t.FirstName
,t.Gender
,t.SpouseName
,cast(str(t.ID,5)+f.Ancestry as varchar(max))
from FamTree f
join OurFamily t on f.ID=t.ParentID
)
,
Relationships as
(
select F1.FirstName
,F1.Gender
,F1.SpouseName
,N,N2
,RelFirstName=F2.FirstName
,RelGender=F2.Gender
,RelSpouseName=F2.SpouseName
from FamTree F1
join (select N=Number
from master..spt_values
where Type='P') Numbers on N between 1 and len(F1.Ancestry)/5
cross apply (select AncestryList=substring(Ancestry,N*5-4,9999)
,PrevList=case
when N=1
then '*'
else substring(Ancestry,N*5-9,9999)
end) X1
join FamTree F2 on charindex(AncestryList,F2.Ancestry)>0
and charindex(PrevList,F2.Ancestry)=0
cross apply (select N2=charindex(AncestryList,F2.Ancestry)/5+1) X2
)
select
FirstName
,Gender
,SpouseName
,RelFirstName
,RelGender
,RelSpouseName
,FamilyRelationName=RelFirstName
,FamilyRelation='my '+
case
when N2-N<-3
then case N2
when 1 then replicate('Great ',N-N2-2)+'Grand'
+case when RelGender='M'
then 'father' else 'mother' end
when 2 then replicate('Great ',N-N2-1)
+case when RelGender='M'
then 'Uncle' else 'Aunt' end
when 3 then '1st Cousin '+convert(varchar,N-N2)+'x Removed'
when 4 then '2nd Cousin '+convert(varchar,N-N2)+'x Removed'
when 5 then '3rd Cousin '+convert(varchar,N-N2)+'x Removed'
else convert(varchar,N2-2)+'th Cousin '
+convert(varchar,N-N2)+'x Removed'
end
when N2-N=-3
then case N2
when 1 then 'Great Grand'
+case when RelGender='M'
then 'father' else 'mother' end
when 2 then 'Great Great '
+case when RelGender='M'
then 'Uncle' else 'Aunt' end
when 3 then '1st Cousin 3x Removed'
when 4 then '2nd Cousin 3x Removed'
when 5 then '3rd Cousin 3x Removed'
when 6 then convert(varchar,N2-2)+'th Cousin 3x Removed'
end
when N2-N=-2
then case N2
when 1 then 'Grand'
+case when RelGender='M'
then 'father' else 'mother' end
when 2 then 'Great '
+case when RelGender='M'
then 'Uncle' else 'Aunt' end
when 3 then '1st Cousin Twice Removed'
when 4 then '2nd Cousin Twice Removed'
when 5 then '3rd Cousin Twice Removed'
when 6 then convert(varchar,N2-2)+'th Cousin Twice Removed'
end
when N2-N=-1
then case N2
when 1 then case when RelGender='M'
then 'Father' else 'Mother' end
when 2 then case when RelGender='M'
then 'Uncle' else 'Aunt' end
when 3 then '1st Cousin Once Removed'
when 4 then '2nd Cousin Once Removed'
when 5 then '3rd Cousin Once Removed'
when 6 then convert(varchar,N2-2)+'th Cousin Once Removed'
end
when N2-N=0
then case N
when 1 then 'Myself'
when 2 then case when RelGender='M'
then 'Brother' else 'Sister' end
when 3 then '1st Cousin'
when 4 then '2nd Cousin'
when 5 then '3rd Cousin'
else convert(varchar,N-2)+'th Cousin'
end
when N2-N=1
then case N
when 1 then case when RelGender='M'
then 'Son' else 'Daughter' end
when 2 then case when RelGender='M'
then 'Nephew' else 'Niece' end
when 3 then '1st Cousin Once Removed'
when 4 then '2nd Cousin Once Removed'
when 5 then '3rd Cousin Once Removed'
else convert(varchar,N-2)+'th Cousin Once Removed'
end
when N2-N=2
then case N
when 1 then 'Grand'
+case when RelGender='M'
then 'son' else 'daughter' end
when 2 then 'Great '
+case when RelGender='M'
then 'Nephew' else 'Niece' end
when 3 then '1st Cousin Twice Removed'
when 4 then '2nd Cousin Twice Removed'
when 5 then '3rd Cousin Twice Removed'
else convert(varchar,N-2)+'th Cousin Twice Removed'
end
when N2-N=3
then case N
when 1 then 'Great Grand'
+case when RelGender='M'
then 'son' else 'daughter' end
when 2 then 'Great Great '
+case when RelGender='M'
then 'Nephew' else 'Niece' end
when 3 then '1st Cousin 3x Removed'
when 4 then '2nd Cousin 3x Removed'
when 5 then '3rd Cousin 3x Removed'
else convert(varchar,N-2)+'th Cousin 3x Removed'
end
when N2-N>3
then case N
when 1 then replicate('Great ',N2-N-2)+'Grand'
+case when RelGender='M'
then 'son' else 'daughter' end
when 2 then replicate('Great ',N2-N-1)
+case when RelGender='M'
then 'Nephew' else 'Niece' end
when 3 then '1st Cousin '+convert(varchar,N2-N)+'x Removed'
when 4 then '2nd Cousin '+convert(varchar,N2-N)+'x Removed'
when 5 then '3rd Cousin '+convert(varchar,N2-N)+'x Removed'
else convert(varchar,N-2)+'th Cousin '
+convert(varchar,N2-N)+'x Removed'
end
end
into
#FamilyRelations
from Relationships
where (FirstName=@Who or SpouseName=@Who)
;

with SpouseRelations as
(
select
FirstName
,Gender
,SpouseName
,RelFirstName
,RelGender
,RelSpouseName
,FamilyRelationName
,FamilyRelation=
case when FamilyRelation='my Myself'
then 'Myself' else FamilyRelation end
,SpouseRelationName=RelSpouseName
,SpouseRelation=
case
when FamilyRelation='my Myself'
then case when RelGender='M'
then 'my Wife' else 'my Husband' end
when charindex('Grandfather',FamilyRelation)>0
then replace(FamilyRelation,'father','mother')
when charindex('Grandmother',FamilyRelation)>0
then replace(FamilyRelation,'mother','father')
when FamilyRelation='my Father'
then 'my Mother'
when FamilyRelation='my Mother'
then 'my Father'
when charindex('Uncle',FamilyRelation)>0
then replace(FamilyRelation,'Uncle','Aunt')
when charindex('Aunt',FamilyRelation)>0
then replace(FamilyRelation,'Aunt','Uncle')
when FamilyRelation='my Son'
then 'my Daughter-In-Law, married to '+RelFirstName
when FamilyRelation='my Daughter'
then 'my Son-In-Law, married to '+RelFirstName
else case
when RelGender='M'
then 'Wife'
else 'Husband'
end+' of '+FamilyRelation+', '+RelFirstName
end
from #FamilyRelations
)
,SpouseMultiple as
(
select Seq=1 union all select 2
)
,
AllRelations as
(
select FirstName
,Gender
,SpouseName
,RelationName=case when Seq=1
then FamilyRelationName
else SpouseRelationName
end
,TempRelation=case when Seq=1
then FamilyRelation
else SpouseRelation
end
from SpouseRelations,SpouseMultiple
)
,
RelationPerspective as
(
select
RelationName
,Relation=
case
when @IsBloodRelative=1
then case
when TempRelation like 'Husband of my Sister%'
then replace(TempRelation,'Husband of ','my Brother-In-Law, married to ')
when TempRelation like 'Wife of my Brother%'
then replace(TempRelation,'Wife of ','my Sister-In-Law, married to ')
when TempRelation like 'Husband of my Daughter%'
then replace(TempRelation,'Husband of ','my Son-In-Law, married to ')
when TempRelation like 'Wife of my Son%'
then replace(TempRelation,'Wife of ','my Daughter-In-Law, married to ')
else TempRelation
end
else case
when TempRelation='Myself'
then 'my '+SpouseType
when TempRelation in ('my Husband','my Wife')
then 'Myself'
when TempRelation in ('my Father','my Mother')
then TempRelation+'-In-Law'
when TempRelation like 'my Son%' or TempRelation like 'my Daughter%'
then TempRelation
when charindex('Grandson',TempRelation)>0
then TempRelation
when charindex('Granddaughter',TempRelation)>0
then TempRelation
else case
when charindex(' of my ',TempRelation)>0
then replace(TempRelation,' of my ',' of my '+SpouseType+'''s ')
else 'my '+SpouseType+'''s '+replace(TempRelation,'my ','')
end
end
end
from AllRelations
cross apply (select SpouseType=case when Gender='M'
then 'Husband' else 'Wife' end) X
where RelationName is not null
)

select RelationName,Relation
from RelationPerspective
order by RelationName
;

drop table #FamilyRelations;
Until next time…

6 comments:

  1. Interesting approach.

    I would leave spouse in a separate TABLE though, to allow for changes. Children would refer to mother and father as separate COLUMNs.

    ReplyDelete
  2. Great blog and very fun to read, thanks.

    ReplyDelete
  3. Hi Brad. On Plamen Ratchev's blog back in May you commented on the pivoting of multiple columns. Your example was this:
    select *
    from
    (select product, pivotkey='v'+convert(char,market_year), pivotamt=value
    from Products
    union all
    select product, pivotkey='q'+convert(char,market_year), pivotamt=quantity
    from Products) I
    pivot
    (sum(pivotamt) for pivotkey in (v2003,q2003,v2004,q2004,v2005,q2005,v2006,q2006)) P.

    I like this approach and tried to use it by replacing the values in the pivot statement (v2003,q2003,v2004,q2004,v2005,q2005,v2006,q2006for you) with a variable but SQL didn't like that. Do you have a suggestion on how this may be done?

    Thanks,
    Keith

    ReplyDelete
  4. @Keith:

    Perhaps you need a dynamic pivot. Check out Adam Haines' blog about this: http://jahaines.blogspot.com/2009/08/pivoting-data-using-sql-2005-part-2.html

    See if that helps.

    ReplyDelete
  5. Looks promising, thank you! And sorry about posting here, didn't realize until afterwards that your blog was broken down into sections.

    Thanks again,
    Keith

    ReplyDelete
  6. Hello Brad,

    Great Blog this was near exactly what I was looking for but I could use some help! I am new to creating a DB and don't understand much of this. Could you post/provide me a diagram of all table names, their columns and linked primary keys? Or perhaps a file showing this detail. This would help me learn!

    Thanks,

    Shawn

    ReplyDelete