Step-by-Step Walkthrough
Don't know where to get started building a custom syntax language? This walkthrough is the perfect place to go because it gives the steps that should be followed, along with links to many other helpful topics in this documentation.
This walkthrough gives you a sequence of concepts to understand when working with syntax languages. Each section in the walkthrough provides a brief overview along with links to other documentation topics that give detailed information about the related subject.
Tip
The Sample Browser application included with our installer includes a series of Getting Started QuickStarts for building a custom language with full source code and a working sample for each step.
Understanding Language Concepts
A syntax language is an object implementing ISyntaxLanguage that can be assigned to an ICodeDocument to provide language-specific functionality for the document. This functionality can be anything from word-break finding to lexing/parsing, all available via registered services. Multiple syntax languages can be created so that different documents can each receive different functionality.
For instance, the C# language uses different word-break and tokenizing rules than the CSS language does. By having different syntax languages defined for C# and CSS, different functionality can be applied to a document based on the language it currently is using.
- The Syntax Languages topic talks at a high-level about syntax languages and what they do.
- The Service Locator Architecture topic talks in-depth about the service locator architecture utilized by syntax languages and how it provides an open and extensible framework for plugging in language features. It also describes the available built-in service types. We will get more into implementing services later in this topic.
- The Feature Services, Provider Services, and Event Sink Services topics give details on the various built-in service types that are available, but more on that below.
Gathering Language Specifications
Once you understand the basic concepts of what syntax languages do, it's time to think about implementing a custom syntax language for the language(s) you wish to support in your application. Syntax languages are best used in conjunction with SyntaxEditor
controls since they provide a full-featured editing experience much like what is found in Visual Studio.
A number of free syntax language samples are included in the sample project. Be sure to check those out since we may have already created a syntax language for the language you wish to support. If one is available, feel free to copy and use it in your application. You can extend or modify it using the concepts we describe below.
If no existing syntax language is available for your target language, you can create one yourself. You will need to go on the Internet and find a language specification for your language. Language specifications generally describe the lexical and semantic structure of a language. They are very important because they tell us the language's keywords, the syntax of comments, etc.
Once you have examined your language's specifications, it's time to use the Language Designer tool.
Using the Language Designer Tool
The Language Designer application was built with the goal of making it easy to get started building a language. You essentially input information about your language, and it generates code for you that can be included in your project to use your syntax language.
- Run through the entire Language Designer documentation section since it discusses the functionality of the application, what pieces of a language can be defined in a language project, how to output code, etc.
Our recommendations for using the application are to follow these steps:
- Run the application and create a new language project.
- Set the General Properties for the language.
- Create a Lexer for the language. A dynamic lexer is the fastest way to get started if you are new to the product.
- Define the classification types that will be used.
- Build the language project.
- Fix any errors that are found.
- Use the Live Test feature (if the lexer type you use is supported) and test that your lexer is operating correctly.
- Use the Code Generation feature to output code for the language.
Loading and Using a Language at Run-time
At this point, we assume we have some code for our syntax language. It could have been code generated by the Language Designer application (described above). Or we could have followed the steps in the Creating an ISyntaxLanguage topic to do it by hand.
We are ready to load the language. If we have created a dedicated code-based syntax language class, we create an instance of it. If instead we used the Language Designer to output a language definition (.langdef) file, the Loading a Language Definition topic talks about how to load it.
In either case we now have a variable reference to an ISyntaxLanguage instance. The Using a Language topic talks about how to assign the syntax language to a document to enable its features to be used by the document and related view controls, such as a SyntaxEditor.
We now can run our application and use the language.
Adding Support for Services / Event Sinks
As mentioned in the Service Locator Architecture topic, there are quite a number of other features that can be added to languages via services.
The Feature Services topic lists the various services that implement features for a syntax language. Features can include things like lexers, parsers, line commenters, etc.
Provider Services are services that provide some sort of data upon request. Any language service that implements one of the provider interfaces is automatically called as needed. These services drive features like tagging, adornments, and IntelliPrompt.
Event Sink Services are a category of service types that can be registered with a language and are used to listen to events that take place in a document or an editor control that uses the document. For instance, you can listen to document text change events, key/mouse input events, view selection change events, etc. In each of these events you have the option to handle them within the language. As an example, this provides the ideal place for a syntax language such as HTML to watch for the <
character being typed in a SyntaxEditor so that it can show a completion list containing the available HTML tags.
By using the service locator design pattern, syntax languages are very open and extensible. You can further customize a pre-built syntax language by adding your own service implementations.
Functionality to Language Service Mapping Table
There is an enormous amount of functionality available to be implemented via language services, and with this flexibility can understandably come some confusion. This table maps the services that are typically used in a language to implement the specified functionality.
Functionality | Notes |
---|---|
Lexer-based syntax highlighting |
This functionality is the ability for a lexer to tokenize text and use that information to drive syntax highlighting within a SyntaxEditor. Requires an ILexer feature service and a token tagger provider service. The token tagger (created for each document by the provider service) tags ranges of text as tokens with a certain IClassificationType. The classification types that are tagged by the token tagger must be mapped to highlighting styles in a highlighting style registry. Generally the AmbientHighlightingStyleRegistry is used for the mappings. To sum up, the lexer tokenizes text, the token tagger flags ranges of text with tokens/classification types, and the highlighting style registry tells SyntaxEditor how to convert tagged ranges to syntax highlighted markup. When loading a syntax language from a language definition (.langdef) file, everything described above is configured for you. |
Customized syntax highlighting |
While lexers generally provide the core syntax highlighting (via the functionality described above), sometimes you may wish to override the syntax highlighting for certain text ranges, add underlines, change background colors, etc. To accomplish this a tagger provider service must be used that creates a tagger for IClassificationTag objects. The IClassificationTag interface specifies a IClassificationType. The tagger needs to be set up to be ordered before the TaggerKeys.Token tagger, accomplished via the IOrderable interface, so that its classification types merge into and can override the lexer's existing classifications. Similar to above, the classification types that are tagged by the tagger must be mapped to highlighting styles in a highlighting style registry. Generally the AmbientHighlightingStyleRegistry is used for the mappings. The result is that syntax highlighting can be overridden. Multiple layers of customized syntax highlighting can be achieved using this same mechanism. |
Parsing (syntax and/or semantic analysis) |
Syntax and semantic analysis generally include the parsing of tokens to ensure that code is syntactically correct and valid. Parsing kicks off following text changes. Some form of parse data (which can include an AST, error list, etc.) is returned to the document when the parsing completes. The parsing options can be automatically offloaded into a worker thread so that the complex parsing doesn't cause any locking of the main UI thread. The first step in setting up parsing is to ensure an ambient parse request dispatcher is installed right at your application's startup. This is vital because it is what allows the parsing to occur in a separate worker thread. If no ambient parse request dispatcher is installed, parsing will occur in the main UI thread. This could cause the editor user interface to slow down, depending on how much work the parser does. Therefore, make sure it is installed to prevent any slowdowns. Next, register an IParser feature service. Now the parser will be called in a worker thread whenever text changes occur to a document that uses the language. Actipro offers a very robust LL(*) Parser Framework that makes it simple to build IParser-based parsers using EBNF-like notation in C# and VB. It is free to use, has AST construction, error handling, callbacks, and a lot of other features that make it integrate nicely with our core text/parsing framework. Alternatively, since a parser just has to implement the IParser interface, any sort of custom parsing can be performed with our generic parsing mechanism. There are free add-ons that come with SyntaxEditor that make it easy to integrate with popular third-party parsers: |
Automatic outlining support |
Automatic outlining is where the document text is scanned, and an outlining node hierarchy is constructed based on its contents. The outlining node tree is rendered visually in the editor's outlining margin and end users can expand/collapse nodes. As further text changes occur, the outlining node tree is incrementally updated. The IOutliningManager is the object that maintains the outlining node hierarchy. When a language has an IOutliner service registered, the outlining manager knows that the language is capable of performing automatic outlining. The outlining manager uses the outliner service to retrieve an IOutliningSource whenever it needs to update. The outlining source is capable of examining an offset and returning whether an outlining node should start or end there, or neither. There are several base classes included with SyntaxEditor that make it easy to create an outlining source. An Outliner service is required for automatic outlining support. The Outlining and Collapsing Features series of topics talk about how create outlining sources and define nodes. |
Quick info tips for mouse hovers over collapsed regions |
The built-in CollapsedRegionQuickInfoProvider service can be registered on a language to allow quick info tips to be automatically displayed when the end user hovers over a collapsed outlining node adornment, such as the "..." blocks. |
Line commenting |
Line commenting is the ability for SyntaxEditor to comment and uncomment lines with language-specific text. For instance, C# syntax languages can be set up to use This feature requires that an ILineCommenter service be registered with the language. |
Text statistics |
Text statistics can be quickly constructed for any language. The statistics can include everything from word counts to readability scores. Custom statistics can be built for each language. This feature requires that an ITextStatisticsFactory service be registered with the language. |
Word break finding |
While many syntax languages use the same patterns for determining line breaks, there are some syntax languages that require further customization. For instance, the CSS syntax language needs hyphen ( Custom word break finding functionality can be configured for a language by registering an IWordBreakFinder service with the language. |
Automated IntelliPrompt completion (auto-complete and popup list) |
IntelliPrompt Completion is an extremely helpful way for end users to maximize their productivity when writing code. Completion sessions support auto-complete via keys like Ctrl+Space and will show a completion list popup when no single auto-completion can be determined. The IntelliPrompt Completion topic walks through the very extensive list of features available with completion sessions. Syntax languages have several ways they can be set up to provide automated IntelliPrompt completion. One or more ICompletionProvider services may be registered on the language. These provider services can be ordered. When a completion provider service is on the language, any Ctrl+Space keys typed by the end user will call the first completion provider to see if it can open a completion session. If it can't, the next provider is checked, and so on. When opening a session, the completion providers determine if auto-complete is allowed, what features are enabled (filtering, auto-shrink, text matching algorithms, etc.). The completion provider must also populate the items in the completion session prior to opening it. Completion providers often use a mix of looking at any available document ParseData and token scanning to determine which sort of items should be in the session, based on the current editor view caret location. Completion sessions can also be requested when the end user begins to type a new word. This involves having the language register an IEditorDocumentTextChangeEventSink service and in the text changed event notification, check a helper property on the event arguments that indicates if a new word is being typed. If so, request a completion session. A full sample of this is included in the "Opening a Session in Response to a Typed Character" section of the Completion List topic. |
Automated IntelliPrompt quick info |
IntelliPrompt Quick Info is the ability to render helpful tooltips related to end user mouse hovers, or related to what is under the caret. The content of a quick info tip can be plain text, formatted text (using an HTML-like markup), or any The IntelliPrompt Quick Info topic walks through the list of features available with quick info sessions. One or more IQuickInfoProvider services may be registered on the language. These provider services can be ordered. When a quick info provider service is on the language, any mouse hovers by the end user will call the first quick info provider to see if it can open a quick info session. If it can't, the next provider is checked, and so on. The quick info provider populates content in the quick info session prior to it opening. Similar to completion providers, quick info providers often use a mix of looking at any available document ParseData and token scanning to detemine the content that should be presented in the session. |
Navigable symbol (type/member) selector support |
A Navigable Symbol Selector control is included that can be bound to a SyntaxEditor and supports listing accessible symbols (such as types and members) within a document. The selections within the control update as the editor caret is moved. When a symbol is selected from a dropdown, the caret navigates directly to the related symbol declaration. The control requires that a feature service is implemented on the language that returns accessible symbols. The IntelliPrompt Navigable Symbol Provider topic walks through how to create such a service. |
Code snippet support |
IntelliPrompt Code Snippets allow small snippets of code to be inserted into a document. Selection sessions display all available code snippets and allow the end user to easily pick one to activate. Template sessions occur when a code snippet is activated and allow the end user to Tab between pre-defined fields within the code snippet to customize the text that is inserted. Code snippet features require that an IntelliPrompt code snippet provider is registered on the language, with one or more code snippets loaded into it. |
Creating a custom tagger for a document |
A custom tagger can be installed into a document by using a tagger provider language service. Once the tagger is installed into the document, tag aggregators for the tag type returned by the tagger will be collected from the tagger and combined with results from other similar taggers. Custom taggers are often used to drive features like customized syntax highlighting, squiggle line display, adornments, etc. |
Automatically render squiggle lines for parse errors with quick info tips |
If you are using an IParser feature service on your language, it will return resulting parse data to the ICodeDocument.ParseData property. If the parse data object implements the IParseErrorProvider interface then parse errors can be retrieved from it. There is a built-in tagger called ParseErrorTagger that can be attached to a document via a document-oriented tagger provider service. See the Tagger Provider topic for details on making tagger provider services. When this parse error tagger is attached to a document, it will monitor the document's parse data for changes. If the parse data is updated, it will automatically get the list of parse errors from it (assuming the parse data implements IParseErrorProvider) and will return ISquiggleTag tags for each one. This automatically drives the squiggle line display when used in an editor. The SquiggleTagQuickInfoProvider is an IQuickInfoProvider that presents a Quick Info popup with the contents of ISquiggleTag.ContentProvider. When registered with a language along with a ParseErrorTagger, it will provide the description of a parsing error when you hover the mouse over a text range that is tagged by an ISquiggleTag. |
Add custom adornments to a view's text area |
An example of adornments that are not related directly to document text content are alternating row highlights. To implement this sort of feature, a custom adornment manager is created that inherits AdornmentManagerBase An adornment manager provider service is required to create adornment manager instances for each view that should use them. |
Add custom decoration adornments to specific text ranges |
An example of adornments that are related directly to text spans are borders around certain text ranges that indicate find operation results. For scenarios like this, a custom adornment manager is created that inherits DecorationAdornmentManagerBase<T, U>. The base class handles the determination of when to add/update/remove adornments. When a new adornment needs to be added, the AddAdornment method is called. This method is overridden in your inheriting class and the code you implement calls IAdornmentLayer.AddAdornment to add an adornment to the layer. An adornment manager provider service is required to create adornment manager instances for each view that should use them. |
Add custom adornments in between text characters |
Any sort of content can be inserted in-line with text, right between text characters. This feature is known as intra-text adornments. The content could be images, controls, or any other UI element. See the Intra-Text Adornments topic for details on how to use this feature. |
Hide text regions (without outlining) |
Sometimes you may wish to hide a region of text within a SyntaxEditor view but still keep it in the document. This sort of feature can be attained by "tagging" the desired ranges to hide with ICollapsedRegionTag instances. This feature can be used independently from code outlining. In fact, the code outlining feature does use this behind the scenes for collapsed nodes. If you tag the same collapsed regions with an IIntraTextSpacerTag, you can insert an adornment in place of the collapsed region. See the Collapsing Regions without Outlining topic for details on how to use this feature. |
Event notifications |
There are a large number of events to which languages can be attached, everything from document text changes to end user input. Events are logically grouped into various event sink services. Any service registered on a language that implements one of the event sink interfaces is automatically notified whenever a related event occurs. |
Auto-indentation |
Auto-indent is the ability for SyntaxEditor to automatically indent a new line when the Enter key is pressed. The default behavior is to indent a new line to the same level as the preceding line. To override the default behavior, an IIndentProvider service must be registered with the language. |
Text formatting |
Text formatting is the ability for SyntaxEditor to format the text in a specified range where whitespace and other symbols such as braces are adjusted to make code more readable. See the Text Formatter topic for more information. |
Structure matching (move to matching bracket, etc.) |
Structure matching is the ability for SyntaxEditor to find a bracket or other text delimiter that is related to the delimiter that is next to the caret. See the Structure Matcher topic for more information. |
Delimiter highlighting (bracket highlighting) |
Delimiter highlighting renders highlights behind delimiter pairs that are currently next to the caret, also known as bracket highlighting. While the delimiter highlighting feature requires the use of a structure matcher (see above), it also requires that another language service is registered that can tag delimiter highlight ranges. See the Delimiter Highlighting topic for more information. |
Code block selection |
Code block selection adjusts the view's selection by expanding it to include containing code blocks and contracting all the way back down to the caret as appropriate. See the Code Block Selection topic for more information. |
Auto-correct, and case correction |
An auto-corrector can perform additional edits after text changes, such as auto-case correcting language keywords. See the Auto-Corrector topic for more information. |
Delimiter auto-completion |
Delimiter auto-completion is where the user types a start delimiter and a related end delimiter is auto-inserted after the caret. See the Delimiter Auto-Completer topic for more information. |
Moving Forward
This walkthrough should point you in the right direction towards building a custom language. If at any time you find some of the documentation confusing, please contact us and let us know so that we may improve it in the future.
As next steps, look around at the other documentation for the product and examine the sample projects. We provide numerous language samples that you can freely copy and modify.