Posted 8 months ago by Tobias Lingemann - Software Devolpment Engineer, Vector Informatik GmbH
Version: 22.1.1
Platform: .NET 4.8
Environment: Windows 10 (64-bit)
Avatar

Hi,

we have a cache that stores the parse data for documents which are not opened in an editor. We use the parse data to resolve type definitions like this:

private ITypeDefinition ResolveTypeDefinition(
  ITypeReference typeReference,
  ISourceFileLocation location,
  IProjectAssembly projectAssembly,
  IDotNetParseData parseData)
{
  var request = new ResolverRequest(typeReference.QualifiedName)
  {
    Context = mContextFactory.CreateContext(new TextSnapshotOffset(parseData.Snapshot, location.NavigationOffset ?? -1))
  };

  return projectAssembly.Resolver.Resolve(request).Results.FirstOrDefault()?.Type as ITypeDefinition;
}

Now these calls are only a couple of milliseconds, but this happens a few thousand times and adds up to seconds:

100,00 %   ResolveTypeDefinition  •  4.991 ms  •  Vector.ITE.Languages.CS.ExportTables.CsExportTableGenerator.ResolveTypeDefinition(ITypeReference, ISourceFileLocation, IProjectAssembly, IDotNetParseData)
  100,00 %   CreateContext  •  4.991 ms  •  ActiproSoftware.Text.Languages.DotNet.Implementation.DotNetContextFactoryBase.CreateContext(TextSnapshotOffset)
    100,00 %   CreateContext  •  4.991 ms  •  ActiproSoftware.Text.Languages.CSharp.Implementation.CSharpContextFactory.CreateContext(TextSnapshotOffset, DotNetContextKind)
      100,00 %   hOX  •  4.991 ms  •  ActiproSoftware.Text.Languages.CSharp.Implementation.CSharpContextFactory.hOX(qLp, ITextSnapshotReader)
        98,25 %   get_Token  •  4.903 ms  •  ActiproSoftware.Text.Implementation.TextSnapshotReader.get_Token
          97,94 %   zc9  •  4.888 ms  •  ActiproSoftware.Text.Implementation.TextSnapshotReader.zc9
            97,94 %   qct  •  4.888 ms  •  ActiproSoftware.Text.Implementation.TextSnapshotReader.qct(TextRange, Object)
              97,64 %   GetTokensInternal  •  4.873 ms  •  ActiproSoftware.Text.Tagging.Implementation.TokenTagger.GetTokensInternal(ILexer, TextSnapshotRange, Object, out Int32)
                97,64 %   GetTokens  •  4.873 ms  •  ActiproSoftware.Text.Tagging.Implementation.TokenTagger.GetTokens(ILexer, TextSnapshotRange, Object, Boolean, out Int32)
                  97,64 %   GetTokens  •  4.873 ms  •  ActiproSoftware.Text.Lexing.Implementation.LexerContextProvider.GetTokens(ILexer, TextSnapshotRange, Object, Boolean, out Int32)
                    97,39 %   Parse  •  4.860 ms  •  ActiproSoftware.Text.Lexing.Implementation.MergableLexerBase.Parse(TextSnapshotRange, ILexerTarget)
                    0,25 %   TokenSet..ctor  •  12 ms  •  ActiproSoftware.Text.Lexing.Implementation.TokenSet..ctor(TextRange, IEnumerable, Object)
              0,30 %   TextSnapshotRange..ctor  •  15 ms  •  ActiproSoftware.Text.TextSnapshotRange..ctor(ITextSnapshot, TextRange)

Now I was wondering why would we need to parse the document again to get the tokens? The entire document should already be parsed in IDotNetParseData with all tokens available. So I digged a little deeper and found out that the following if-check is not matched:

// If the text range is completely in a single line and that line's cached context data is valid...
var positionRange = snapshotRange.Snapshot.TextRangeToPositionRange(snapshotRange.TextRange);

Now the snapshot range is compared to the entire snapshot range which is never a single line...

I think you intended to use desiredSnapshotRange instead:

var positionRange = snapshotRange.Snapshot.TextRangeToPositionRange(desiredSnapshotRange.TextRange);

[Modified 8 months ago]


Best regards, Tobias Lingemann.

Comments (7)

Posted 8 months ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hi Tobias,

For complex scenarios like this, it would be best if you can write our support address with a new simple sample project that shows the issue.  Then we can debug with that to see what's going on, and if there is a problem.  If you send something over, please reference this thread and remove the bin/obj folders from the .zip you send so it doesn't get spam blocked.

I don't believe the logic is necessarily wrong there, since snapshotRange is constructed above that point (originally based on desiredSnapshotRange) and is what all subsequent code is using.  Having a sample project to reproduce will help us validate the logic and better assess how the code you identified might be improved and/or corrected.

Note that document snapshots do not store the tokens associated with them.  We made that decision since doing so would consume a LOT of extra memory considering multiple immutable snapshots can be live at any given time.  Instead, tokens are lexed on the fly as needed.


Actipro Software Support

Answer - Posted 8 months ago by Tobias Lingemann - Software Devolpment Engineer, Vector Informatik GmbH
Avatar

I see. I resolved the issue by adding a cache which stores the ITypeDefinition result for a given IQualifiedName. Since most types are system-defined like Object or Int32, this is quite effective.


Best regards, Tobias Lingemann.

Posted 4 months ago by Tobias Lingemann - Software Devolpment Engineer, Vector Informatik GmbH
Avatar

The problem has re-emerged after we realized that the IQualifiedName not always resolves to the same symbol...

The performance issue is huge in big projects. I have a customer project were the resolution alone takes 6 minutes for the complete project.

Since we are close to the next release, I am looking for any straw that has a significant performance impact. Is there anything we can do to speed up the lexer process or avoid it all together? I tried to initialize the tokens after the semantic parser is finished, but it didn't work.

I will try to parallelize the process, but thats a bit tricky and even with eight cores it takes too long.


Best regards, Tobias Lingemann.

Posted 4 months ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hello,

Can you get into some overview detail of what you're trying to achieve in general?  It sounds like you have a project assembly with many files in it (how many?), and you are perhaps trying to run your ResolveTypeDefinition method  multiple times in each document.  Are you trying to resolve identifiers in your unloaded documents to type definitions, or something else?

A summary of what the intent of all this is and how things are structured would help us reply more effectively.  And as always, it's best to submit a new simple sample project that shows the issue occurring so that we can debug with it and see what you see.  Thank you!


Actipro Software Support

Posted 4 months ago by Tobias Lingemann - Software Devolpment Engineer, Vector Informatik GmbH
Avatar

Well basically we generate from all source files an extract that contains information about all functions. Background of this is that we support multiple languages and editors that can interact with each other. So C# is just one of those and we use the data to display and validate what functions can be called from the other editors. And obviously we need to account all files, not just those that are opened in an editor.

In that particular case we use ResolveTypeDefinition() to get the referenced type and check if it is an enum. If so we need to get the members of the enum. So this is called for every time a user-defined type is used as a parameter or return type, per function, per file, per project assembly. In this case 62 C# files are all used in 35 project assemblies (so 62 * 35). With the cache this sums up to almost 5.000 calls of ResolveTypeDefinition(). So yeah, thats a lot, but what takes so long is not the actual resolve operation, but the lexical parsing when the context is created.

To summarize, I currently see no way we could reduce the number of calls. It looks like the document is lexed in each resolve call, which is totaly unecessary. I get that you no longer store the tokens to save memory, but I think in my case an option would be great to keep the tokens in memory. Especially since my documents are mostly offline (not open in an editor) and therefore are not constantly changed. If the memory consumption gets to accessive, the developer could reset the option (e.g. in ICodeDocument).

[Modified 4 months ago]


Best regards, Tobias Lingemann.

Answer - Posted 4 months ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hi Tobias,

Our current APIs were designed more for one-off calls, such as when you are displaying IntelliPrompt, and not for intense repeated backend resolution. For memory reasons, we don't currently retain the tokens for the snapshot, and therefore the context building logic will lex tokens again on each call. That being said, we do retain lexer contexts at the end of each document line, and the lexer should be able to pick up fairly quickly from anywhere in the document.

I'm wondering if we can avoid the default context building (and lexer) altogether here though since you are purely working on already-constructed parse data structures.  

I did a quick mockup of a context implementation class where you pass it in some info and it will give the bare minimum of what I think the resolver would need. I didn't get to test it here, but perhaps try this kind of concept on your end and see if it helps.

public class QualifiedNameContext : IDotNetContext {

	private TypeDeclaration containingAstTypeDeclaration;

	public QualifiedNameContext(
		IProjectAssembly projectAssembly, 
		IDotNetParseData parseData, 
		ISourceFileLocation location
	) {
		var snapshot = parseData.Snapshot;
		
		// Get the offset right before the source location range end (last character)
		this.ProjectAssembly = projectAssembly;
		this.InitializationSnapshotRange = new TextSnapshotRange(snapshot, location.TextRange);
		this.SnapshotOffset = new TextSnapshotOffset(snapshot, location.TextRange.EndOffset - 1);
		this.TargetSnapshotOffset = new TextSnapshotOffset(snapshot, location.TextRange.EndOffset - 1);

		// Find the AST node that encloses tha offset, which should be an identifier
		var astNode = parseData.Ast.FindDescendantNode(this.SnapshotOffset);
		this.ContainingAstNode = astNode?.Parent;
	}

	public int? ArgumentIndex => null;

	public TextSnapshotOffset? ArgumentListSnapshotOffset => null;

	public TextSnapshotOffset? ArgumentSnapshotOffset => null;

	public IAstNode ContainingAstNode { get; private set; }

	public TypeDeclaration ContainingAstTypeDeclaration {
		get {
			if (containingAstTypeDeclaration == null) {
				var astNode = this.ContainingAstNode;
				while (astNode != null) {
					containingAstTypeDeclaration = astNode as TypeDeclaration;
					if (containingAstTypeDeclaration != null)
						break;

					astNode = astNode.Parent;
				}
			}

			return containingAstTypeDeclaration;
		}
	}

	public TextSnapshotRange? InitializationSnapshotRange { get; private set; }

	public DotNetContextKind Kind => DotNetContextKind.Self;

	public IDotNetContextLocation Location => null;

	public IProjectAssembly ProjectAssembly { get; private set; }

	public TextSnapshotOffset SnapshotOffset { get; private set; }

	public Expression TargetExpression => null;

	public TextSnapshotOffset? TargetSnapshotOffset { get; private set; }

	public IResolverResultSet Resolve() {
		throw new NotImplementedException();
	}

}


Actipro Software Support

Posted 4 months ago by Tobias Lingemann - Software Devolpment Engineer, Vector Informatik GmbH
Avatar

Thank you very much. My first tests look very promising.


Best regards, Tobias Lingemann.

The latest build of this product (v22.1.4) was released 3 months ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.