Tuesday, January 26, 2010

Handy String Functions

Strings!Visual FoxPro (VFP) is very good at handling strings, and there are many useful string functions built into its language. In making the move to T-SQL, I have often wished some of those handy string functions were available.

But since they are not, I went ahead and created User-Defined Functions (UDFs) to emulate those VFP string functions. Perhaps you will find them to be useful also.

In this blog entry, I’ll introduce these UDFs (providing links to the VFP Books Online documentation that they're based upon) and give some examples of how to use them. The actual code to create these UDFs will be at the end of this article.

Stay tuned for my next blog entry, in which I’ll put together a zany project that makes use of some of these UDFs.

In order for many of these functions to do their work, they need the actual true length of a string. The LEN() function in T-SQL has the unfortunate feature of excluding trailing blanks in calculating string length. So the first function that I created is what I call a TRUELEN() function, which returns the full length of a string regardless of any trailing blanks.

select TrueLength=dbo.TrueLen('abcde     ')
/*
TrueLength
-----------
10
*/
The AT() Function is just like T-SQL’s CHARINDEX() function in that it returns the position of a search string within another string expression. The difference is that its third argument indicates the occurrence rather than a start position. Here are a couple of examples:

select FirstLetterE=dbo.At('e','Here is a sentence',1)
,ThirdLetterE=dbo.At('e','Here is a sentence',3)
,FifthLetterE=dbo.At('e','Here is a sentence',5)
/*
FirstLetterE ThirdLetterE FifthLetterE
------------ ------------ ------------
2 12 18
*/

select SecondTheOccurrence=dbo.At('the','The end of the story',2)
/*
SecondTheOccurrence
-------------------
12
*/
The RAT() Function is just like AT(), except that it finds the position of the rightmost occurrence of a search string within a string expression. This can come in handy, for example, if you have a fully-qualified filename containing lots of backslashes and you want to find the position at which the true filename starts:

select LastLetterE=dbo.RAt('e','Here is a sentence',1)
,SecondToLastLetterE=dbo.RAt('e','Here is a sentence',2)
,FifthToLastLetterE=dbo.RAt('e','Here is a sentence',5)
/*
LastLetterE SecondToLastLetterE FifthToLastLetterE
----------- ------------------- ------------------
18 15 2
*/

declare @FullyQualifiedName varchar(50)
set @FullyQualifiedName='C:\Windows\System32\Config\System.Log'
select PathNameOnly=left(@FullyQualifiedName
,dbo.RAt('\',@FullyQualifiedName,1))
,FileNameOnly=substring(@FullyQualifiedName
,dbo.RAt('\',@FullyQualifiedName,1)+1
,len(@FullyQualifiedName))
/*
PathNameOnly FileNameOnly
--------------------------- ------------
C:\Windows\System32\Config\ System.Log
*/
The OCCURS() Function will tell you the number of times a string expression occurs in another string expression:

select LetterEOccurrences=dbo.Occurs('e','Here is a sentence')
,WordTheOccurrences=dbo.Occurs('the','The end of the story')
/*
LetterEOccurrences WordTheOccurrences
------------------ ------------------
5 2
*/
The CHRTRAN() Function is kind of similar to T-SQL’s REPLACE() function, except it will do multiple individual character translations. Its first argument is a string that you want to perform the translations upon. The second argument is a string of characters that you want to individually translate. And the third argument is a string of characters that are the replacements/translations of the characters in the second argument. Here are some examples to illustrate:

select Translate1=dbo.ChrTran('abcdefghi','aei','XYZ')
,Translate2=dbo.ChrTran('123456789','316','***')
/*
Translate1 Translate2
---------- ----------
XbcdYfghZ *2*45*789
*/
The length of the third argument does not have to match the length of the second argument. Note what happens when I pass an empty string as the third argument… the characters in the second argument are removed (or just replaced with a zero-length character):

select VowelsRemoved=dbo.ChrTran('abcdefghi','aei','')
/*
VowelsRemoved
-------------
bcdfgh
*/
The above illustrates a handy way to remove multiple characters from a string. But here’s a clever way to use CHRTRAN() to remove all the characters from a string except certain ones. This example removes all non-numeric characters from the string:

declare @PhoneNumber varchar(30)
set @PhoneNumber='(650) 555-1212'
select NumericDigitsOnly=dbo.ChrTran(@PhoneNumber
,dbo.ChrTran(@PhoneNumber,'0123456789','')
,'')
/*
NumericDigitsOnly
-----------------
6505551212
*/
Note that the inner CHRTRAN() removed all the numeric characters from the string, and then the outer CHRTRAN() made use of that result to remove those characters from the original string. Cute, huh?

The STREXTRACT() Function extracts data from a string between two delimiters. I believe this was originally introduced to the VFP language to aid in shredding XML. The first argument is the string to search. The second and third arguments are the beginning and ending delimiters. The fourth argument specifies at which occurrence of the beginning delimiter you want to start the extraction.

And finally a fifth argument is a flag in which you can specify additive options… a 1 indicates a case-insensitive search (I do not make use of this value in my UDF definition and will let the collation of the database determine whether the search is case-insensitive or not), a 2 indicates that the end delimiter is not required to be found in order to do the extraction, and a 4 indicates that you wish to include the delimiters in the returned expression. Note that I said that these options are additive… for example, you can pass the value 6, which is the sum of 2+4, so both of those options will be honored.

Here are some examples to illustrate:

declare @XMLString varchar(max)
set @XMLString='
<stooge><id>1</id><name>Moe</name></stooge>
<stooge><id>2</id><name>Larry</name></stooge>
<stooge><id>3</id><name>Curly</name></stooge>'
select SecondStooge=dbo.StrExtract(@XMLString,'<name>','</name>',2,0)
,SecondStoogeWithDelimiters=dbo.StrExtract(@XMLString,'<name>','</name>',2,4)
/*
SecondStooge SecondStoogeWithDelimiters
------------ --------------------------
Larry <name>Larry</name>
*/


declare @WordList varchar(max)
set @WordList='one;two;three;four;five'
select FourthWord=dbo.StrExtract(';'+@WordList,';',';',4,0)
,FifthWord=dbo.StrExtract(';'+@WordList,';',';',5,0) /* Oops: No End Delimiter */
,FifthWordEndDelimNotReqd=dbo.StrExtract(';'+@WordList,';',';',5,2)
,FifthWordEndDelimNotReqdIncludeDelims=dbo.StrExtract(';'+@WordList,';',';',5,2+4)
/*
FourthWord FifthWord FifthWordEndDelimNotReqd FifthWordEndDelimNotReqdIncludeDelims
---------- --------- ------------------------ -------------------------------------
four five ,five
*/
VFP offers a PROPER() Function, which will capitalize a string “as appropriate” for proper names. So, for example, it will take the string BRAD SCHULZ and will return Brad Schulz. That’s fine, but the unfortunate thing is that PROPER() is very limiting… it will set the whole string to lower case and will capitalize any letters that follow a space (or are at the beginning of the string)… and that’s it. It will not handle letters that come after hyphens or parentheses or quotation marks or any other special characters correctly. So, if you pass it ITZIK BEN-GAN (T-SQL “GURU”), it would end up returning Itzik Ben-gan (t-sql “guru”)… only two letters would get capitalized… the ‘I’ and the ‘B’.

So I came up with a function that I call PROPERIZE(). It will correctly capitalize any letters that follow various special characters (like hyphen, ampersand, parenthesis, etc). It also takes a second argument… if it is equal to 1, then the string will be converted to lower case first before the capitalization takes place… if it is equal to 0, then the string is capitalized “as is”. The following examples illustrate this:

select Example1=dbo.Properize('itzik ben-gan (T-SQL "guru")',1)
,Example2=dbo.Properize('itzik ben-gan (T-SQL "guru")',0)
/*
Example1 Example2
---------------------------- ----------------------------
Itzik Ben-Gan (T-Sql "Guru") Itzik Ben-Gan (T-SQL "Guru")
*/

select Example3=dbo.Properize('ITZIK BEN-GAN (T-SQL "GURU")',1)
,Example4=dbo.Properize('ITZIK BEN-GAN (T-SQL "GURU")',0)
/*
Example3 Example4
---------------------------- ----------------------------
Itzik Ben-Gan (T-Sql "Guru") ITZIK BEN-GAN (T-SQL "GURU")
*/
Finally, we have the PADL() and PADR() and PADC() Functions, which pad a string to a specified length with a specific character on the left or right sides, or both:

select PadLeft=dbo.PadL('Title',15,'*')
,PadRight=dbo.PadR('Title',15,'*')
,PadCenter=dbo.PadC('Title',15,'*')
/*
PadLeft PadRight PadCenter
--------------- --------------- ---------------
**********Title Title********** *****Title*****
*/
The code to create all these functions is below. Many of these functions are dependent upon each other, so you should run the code to create them all at once. Instead of doing a copy/paste of the code below, you can go here to download the code.

Just a reminder… Be sure to tune in again next time, when I put together a wacky project that makes use of these string functions. Until then…

use TempDB  /* Change to whatever database you wish */
go

/*----------------------------------------------------------------------------*/

if object_id('TrueLen') is not null drop function TrueLen
go
create function TrueLen
(
@Expr nvarchar(max)
)
returns int
as

begin
declare @TrueLen int
set @TrueLen=len(@Expr+'*')-1 /* or datalength(@Expr)/2 */
return @TrueLen
end
go

/*----------------------------------------------------------------------------*/

if object_id('At') is not null drop function At
go
create function At
(
@Expr1 nvarchar(max)
,@Expr2 nvarchar(max)
,@Occurrence int = 1
)
returns int
as
begin
declare @Position int
if @Expr1 is not null and
@Expr2
is not null and
@Occurrence
is not null
begin
declare @Counter int
select @Position=0, @Counter=0
while @Counter<@Occurrence
begin
set @Counter=@Counter+1
set @Position=charindex(@Expr1,@Expr2,@Position+1)
if @Position=0 or @Position is null break
end
end
return @Position
end
go

/*----------------------------------------------------------------------------*/

if object_id('RAt') is not null drop function RAt
go
create function RAt
(
@Expr1 nvarchar(max)
,@Expr2 nvarchar(max)
,@Occurrence int = 1
)
returns int
as
begin
declare @Position int
if @Expr1 is not null and
@Expr2
is not null and
@Occurrence
is not null
begin
declare @Expr1Len int, @Expr2Len int, @AtPos int
select @Expr1Len=dbo.TrueLen(@Expr1)
,@Expr2Len=dbo.TrueLen(@Expr2)
,@AtPos=dbo.At(reverse(@Expr1),reverse(@Expr2),@Occurrence)
set @Position=@Expr2Len-(@Expr1Len+@AtPos-1)+1
end
return @Position
end
go

/*----------------------------------------------------------------------------*/

if object_id('Occurs') is not null drop function Occurs
go
create function Occurs
(
@Expr1 nvarchar(max)
,@Expr2 nvarchar(max)
)
returns int
as
begin
declare @Result int
if @Expr1 is not null and
@Expr2
is not null
begin
declare @Expr1Len int
,@Expr2Len int
,@NewExpr2 nvarchar(max)
,@NewExpr2Len int
select @Expr1Len=dbo.TrueLen(@Expr1)
,@Expr2Len=dbo.TrueLen(@Expr2)
,@NewExpr2=replace(@Expr2,@Expr1,'')
set @NewExpr2Len=dbo.TrueLen(@NewExpr2)
set @Result=case
when @Expr1Len=0
then 0
else (@Expr2Len-@NewExpr2Len)/@Expr1Len
end

end
return @Result
end
go

/*----------------------------------------------------------------------------*/

if object_id('ChrTran') is not null drop function ChrTran
go
create function ChrTran
(
@Expr nvarchar(max)
,@SearchChars nvarchar(max)
,@ReplaceChars nvarchar(max)
)
returns nvarchar(max)
as
begin
declare @Result nvarchar(max)
if @Expr is not null and
@SearchChars
is not null and
@ReplaceChars
is not null
begin
declare @Counter int
select @Result=@Expr, @Counter=0
while @Counter<dbo.TrueLen(@SearchChars)
begin
set @Counter=@Counter+1
set @Result=replace(@Result
,substring(@SearchChars,@Counter,1)
,substring(@ReplaceChars,@Counter,1))
end
end
return @Result
end
go

/*----------------------------------------------------------------------------*/

if object_id('StrExtract') is not null drop function StrExtract
go
create function StrExtract
(
@Expr nvarchar(max)
,@BeginDelim nvarchar(max)
,@EndDelim nvarchar(max)
,@Occurrence int
,@Flags int /* 2=EndDelim not required, 4=Include Delims in result */
)
returns nvarchar(max)
as
begin
declare @Result nvarchar(max)
if @Expr is not null and
@BeginDelim is not null and
@EndDelim
is not null and
@Occurrence
is not null and
@Flags is not null
begin
declare @Exprlen int
,@BeginDelimLen int
,@EndDelimLen int
,@BeginDelimPos int
,@EndDelimPos int
select @Result=''
,@Exprlen=dbo.TrueLen(@Expr)
,@BeginDelimLen=dbo.TrueLen(@BeginDelim)
,@EndDelimLen=dbo.TrueLen(@EndDelim)
,@BeginDelimPos=dbo.At(@BeginDelim
,@Expr
,@Occurrence)
if @BeginDelimPos>0
begin
set @EndDelimPos=charindex(@EndDelim
,@Expr
,@BeginDelimPos+@BeginDelimLen)
if @EndDelimPos=0 and @Flags&2=2
set @EndDelimPos=@Exprlen+1
if @EndDelimPos>0
set @Result=case
when @Flags&4=4 /* Include Delimiters in Result */
then substring(@Expr
,@BeginDelimPos
,@EndDelimPos-@BeginDelimPos+@EndDelimLen)
else substring(@Expr
,@BeginDelimPos+@BeginDelimLen
,@EndDelimPos-@BeginDelimPos-@BeginDelimLen)
end
end
end
return @Result
end
go

/*----------------------------------------------------------------------------*/

if object_id('Properize') is not null drop function Properize
go
create function Properize
(
@Expr nvarchar(max)
,@SetToLowerCaseFirst bit = 0
)
returns nvarchar(max)
as
begin
declare @Result nvarchar(max)
if @Expr is not null
begin
declare @Position int
,@Capitalize bit
,@Char nchar(1)
select @Result=case
when @SetToLowerCaseFirst=1
then lower(@Expr)
else @Expr
end
,@Position=0
,@Capitalize=1
while @Position<len(@Result)
begin
select @Position=@Position+1
,@Char=substring(@Result,@Position,1)
if @Capitalize=1
select @Capitalize=0
,@Result=stuff(@Result
,@Position
,1
,upper(@Char))
if charindex(@Char,' #%&*()-_=+[]{}":./')>0 set @Capitalize=1
end
end
return @Result
end
go

/*----------------------------------------------------------------------------*/

if object_id('PadL') is not null drop function PadL
go
create function PadL
(
@Expr nvarchar(max)
,@Size int
,@PadChar char(1)
)
returns nvarchar(max)
as
begin
declare @Result nvarchar(max)
if @Expr is not null and
@Size is not null and
@PadChar is not null
begin
set @Result=right(replicate(@PadChar,@Size)+@Expr,@Size)
end
return @Result
end
go

/*----------------------------------------------------------------------------*/

if object_id('PadR') is not null drop function PadR
go
create function PadR
(
@Expr nvarchar(max)
,@Size int
,@PadChar char(1)
)
returns nvarchar(max)
as
begin
declare @Result nvarchar(max)
if @Expr is not null and
@Size is not null and
@PadChar is not null
begin
set @Result=left(@Expr+replicate(@PadChar,@Size),@Size)
end
return @Result
end
go

/*----------------------------------------------------------------------------*/

if object_id('PadC') is not null drop function PadC
go
create function PadC
(
@Expr nvarchar(max)
,@Size int
,@PadChar char(1)
)
returns nvarchar(max)
as
begin
declare @Result nvarchar(max)
if @Expr is not null and
@Size is not null and
@PadChar is not null
begin
declare @Exprlen int
,@LeftSize int
set @Exprlen=dbo.TrueLen(@Expr)
set @LeftSize=case when @Size<@Exprlen then 0 else (@Size-@Exprlen)/2 end
set @Result=replicate(@PadChar,@LeftSize)+dbo.PadR(@Expr,@Size-@LeftSize,@PadChar)
end
return @Result
end
go

8 comments: