Problem with RegEx Replace

SyntaxEditor for Windows Forms Forum

Posted 10 years ago by Michael Dempsey - Sr. Developer, Teradata Corp
Version: 14.1.0321
Platform: .NET 4.5
Environment: Windows 8 (64-bit)
Avatar

The following is a simplified example of a RegEx replace that one of my customers is trying to perform.

Original text:

 1 x

 2 x

 3 x

 4 x

 5 x

 6 x

 7 x

 8 x

 9 x

i.e. (space)#(space)x

Find Expression: \b+([^ ]+).+

Replace with: $1

The output is:

1

 

3

 

5

 

7

 

9

while it should be:

1

2

3

4

5

6

7

8

9

I tried the same expression in Visual Studio and it results in the correct output.

We tried various versions of the find expression, adding $ to the end or ^ to the start etc.
Even tried replacing the . with [\s\w] and specifically trying to catch the line end with [\r\n] to ensure that it did not 'match' the line end.
The only thing that worked was to change the \b+ to \s+ which works OK for the specific situation where the lines start with a space ... but in some cases they dont. (and using \s* at the start produces the same incorrect results as \b+.)

Could you please take a look at the RegEx engine you are using.

Thanks
Mike

Comments (7)

Posted 10 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hi Mike,

This sort of regex seems to work for me:

\b+([^ \n]+).+

Note that I added the \n in the character class. 


Actipro Software Support

Posted 10 years ago by Michael Dempsey - Sr. Developer, Teradata Corp
Avatar

Yes that works so I have forwarded it to my customer.

However I dont understand why adding \n to the 'not a space' would matter in this specific example. Since we are looking for the first space after the first word this should only be required if the first word is immediately followed by a 'new line' character... which it never is in the example.

Posted 10 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hi Mike,

A "[^ ]+" will consume all non-space characters, including newlines.  That's why it will match forward.


Actipro Software Support

Posted 10 years ago by Michael Dempsey - Sr. Developer, Teradata Corp
Avatar

OK. That's what I thought, but since the first word on every line is followed by a space character it should never reach those line end characters in the part of the pattern that is within parens.

Surely the part that should exclude the line end is the final .+ which is why we first tried adding a $ to the end so that it would match the line end. The original pattern seems to correctly stop at the line end, but then it replaces everything on the second line except for the line end character itself. (and every second line.) 

Posted 10 years ago by Michael Dempsey - Sr. Developer, Teradata Corp
Avatar

Also, someone at Acrtipro had told me that your RegEx Find/Replace was designed to work the same as the one in Visual Studio, and our original pattern works correctly in VS.

Posted 10 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hi Mike,

This sort of pattern also works (line start, consume spaces, match non-space sequence, consume rest of line to end):

^ [ ]* ([^ ])+ .+ $

Let me explain in more detail how the  \b was causing problems.  After the first replace, the pointer is after the "1".  It sees a word boundary with the \n right after, which is a zero-width assertion.  Then it matches a sequence of non-space, which is the \n character.  Then the next character is a space so it skips to the .+ part, which consumes to the end of the line.

Our engine uses a large subset of the .NET regex engine (not VS) syntax, so it should generally be on par with results from that engine.  There may be a couple very minor differences though depending on modes used in the .NET regex engine.


Actipro Software Support

Posted 10 years ago by Michael Dempsey - Sr. Developer, Teradata Corp
Avatar

OK I guess the response I got to an earlier question - saying that VS was your model - was incorrect.

I pointed out to the customer that \b was not something I would normally expect someone to use and they admitted that they had used \b thinking 'blank'. ie. they meant to use \s.

Unfortunately they have run into enough unexpected results that they now perform RegEx replaces by copying the code to Notepad++, making the change there, and then copying it back.
(Notepad++ always seems to work the same as VS in the 'problem' scenarios I have checked.)

The latest build of this product (v24.1.1) was released 1 month ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.