
Hi
I am trying to implement a nonmergable lexer, after using for a while a mergable one. We hope to speed up the lex and parse process. in my ILexer.Parse i call the antr lexer:This give often an error in the parse process (object null) of Actipro, so there must be something wrong.
Maybe this is related to the fact that some text is scanned per line, in case of a Multiline comment or line comments e.g. gives a problem?
My questions:
1. Do you see any (obvious) mistakes in my code?
2. If i debug i see the text is often lexed per line. Is this a setting, can i adjust the textrange, why is that anyway?
3. Creating an EndOfDocument token message succeedes, but with Positions and length sh the token contain?
4. it looks like that text is lexed more than one time. Which reasons are there, and can i prevent that?
5. In ILexerContext lexerContext = parseTarget.OnPreParse(ref offsetStart); the offset is often put back the zero or begin of line, even at the end of the document. Can i discard these?
Regards
Martin
[Modified at 07/19/2010 08:50 AM]
[Modified at 07/19/2010 08:50 AM]
I am trying to implement a nonmergable lexer, after using for a while a mergable one. We hope to speed up the lex and parse process. in my ILexer.Parse i call the antr lexer:
TextRange ILexer.Parse(TextSnapshotRange snapshotRange, ILexerTarget parseTarget) {
// 1. Prepare TextRange if needed
int offsetStart = snapshotRange.StartOffset;
int offsetEnd = snapshotRange.EndOffset;
Debug.WriteLine("Parse text at "+offsetStart+" till "+offsetEnd);
// 2. the lexer target where we would like to begin lexing at
ILexerContext lexerContext = parseTarget.OnPreParse(ref offsetStart);
Debug.WriteLine("Parse new OffsetStart: " + offsetStart);
ILexicalScopeStateNode lexicalScope = new BlaiseLexicalScopeStateNode();
// 3. start lexing
// 3.1 prepare lexing
Antlr.Runtime.IToken token = null;
int pos = offsetStart;
ITextBufferReader rdr = snapshotRange.Snapshot.GetReader(offsetStart).BufferReader;
try {
ParseInfo pi = _requestParseInfo();
BlaiseLexer lexer = new Meta.Parsing.BlaiseLexer();
// searchpaths
foreach (string s in pi.IncludeSearchPath.Split(';')) { lexer.SearchPaths.Add(s); }
// get the text
string text = rdr.GetSubstring(offsetStart, offsetEnd - offsetStart);
lexer.IncludeFile += new EventHandler<IncludeFileEventArgs>(DoIncludeFile);
lexer.LoadText(text, pi.Filename);
// 3.2 start lexing
token = lexer.NextToken();
bool resume = true;
while (resume && token != null && token.Type != Mediator.EOF && pos<=offsetEnd) {
Antlr.Runtime.CommonToken at = token as Antlr.Runtime.CommonToken;
TextPosition endPos = snapshotRange.Snapshot.OffsetToPosition(at.Text.Length - 1 + pos);
TextPosition startPos = snapshotRange.Snapshot.OffsetToPosition(pos);
try {
if (token.Type == Mediator.ML_COMMENT)
;
if (token.Type != Mediator.WS) { // skip WhiteSpace tokens
resume = parseTarget.OnTokenParsed(new BlaiseToken(token.Type, pos, token.Text.Length, startPos, endPos) , lexicalScope);
}
} catch (Exception ex){
Debug.WriteLine("Lexing: token lex failed " + ex.Message);
}
pos += token.Text.Length;
token = lexer.NextToken();
}
lexer.IncludeFile -= new EventHandler<IncludeFileEventArgs>(DoIncludeFile);
} catch (Exception ex) {
token = null;
Debug.WriteLine("Lexing: complete lex failed " + ex.Message);
}
// 4. Send EndDocument Token fi end of file
if (rdr.Length<= pos) {
try {
BlaiseToken et = BlaiseToken.EndDocument(offsetEnd, 0, snapshotRange.Snapshot.OffsetToPosition(pos), snapshotRange.Snapshot.OffsetToPosition(pos));
parseTarget.OnTokenParsed(et, lexicalScope);
} catch { }
}
// 5. notify end of lexing
parseTarget.OnPostParse(pos);
// 6. return
return new TextRange(offsetStart, pos);
}
Maybe this is related to the fact that some text is scanned per line, in case of a Multiline comment or line comments e.g. gives a problem?
My questions:
1. Do you see any (obvious) mistakes in my code?
2. If i debug i see the text is often lexed per line. Is this a setting, can i adjust the textrange, why is that anyway?
3. Creating an EndOfDocument token message succeedes, but with Positions and length sh the token contain?
4. it looks like that text is lexed more than one time. Which reasons are there, and can i prevent that?
5. In ILexerContext lexerContext = parseTarget.OnPreParse(ref offsetStart); the offset is often put back the zero or begin of line, even at the end of the document. Can i discard these?
Regards
Martin
[Modified at 07/19/2010 08:50 AM]
[Modified at 07/19/2010 08:50 AM]