Using the InternetTools FPC library in Delphi

In fact, the article is somewhat broader - it describes a way to enable transparent use of many other libraries (and not only from the world of Free Pascal ), and InternetTools was chosen because of its remarkable feature - this is the case when (surprisingly) is missing Delphi-version with the same broad capabilities and ease of use.

This library is designed to extract information (parsing) from web documents (XML and HTML), allowing you to use both high-level query languages such as XPath and XQuery to specify the necessary data, and, as one of the options, providing direct access to elements of the tree, built on the document.

Brief introduction to InternetTools


Further material will be illustrated on the basis of a fairly simple task, which involves obtaining those elements of bulleted and numbered lists of this article that contain references, for which, if you refer to the documentation , such a small code is enough (it is based on the penultimate example with minor, minor changes) ):

uses xquery; const ArticleURL = 'https://habr.com/post/415617'; ListXPath = '//div[@class="post__body post__body_full"]//li[a]'; var ListValue: IXQValue; begin for ListValue in xqvalue(ArticleURL).retrieve.map(ListXPath) do Writeln(ListValue.toString); end. 

However, now this compact and object-oriented code can only be written in Free Pascal, we also need to be able to use everything that this library provides in a Delphi application, preferably in a similar style, with the same facilities; It is also important to note that InternetTools is thread-safe (it can be accessed from many threads at the same time), so our option should provide this.

Ways of implementation


If we approach the task as far as possible from a distance, then there are several ways to use something written in another PL , they will be 3 large groups:

  1. Placing the library in a separate process , the executable file of which is created by the forces, in this case, FPC . This method can also be divided into two categories where possible network communication:
  2. Encapsulation of a library in a DLL (hereinafter sometimes referred to as a “dynamic library”), working, by definition, within a single process. Although COM objects can be placed in a DLL, the article will consider a simpler and less time consuming method, which, with all this, gives the same comfort when calling the library functionality.
  3. Porting As in the previous cases, the expediency of this approach - rewriting code into another language - is determined by the balance between its pros and cons, but in the situation with InternetTools the disadvantages of porting are much more, namely: because of the considerable amount of library code, you need to do some serious work (even taking into account the similarity of programming languages), and also periodically, due to the development of the ported one , the task of transferring fixes and new features to Delphi will appear.

Dll


Further, in order to give the reader the opportunity to feel the difference, there are 2 options that are notable for their ease of use.

"Classic" implementation


Let us first try to use InternetTools in a procedural style dictated by the very nature of a dynamic library, capable of exporting only functions and procedures; We will make the style of communication with the DLL look like WinAPI, when the handle of a certain resource is first requested, then the useful work is performed, and then the received handle is destroyed (closed). It is not necessary in all to consider this option as a role model - it is chosen only for demonstration and subsequent comparison with the second - a kind of poor relative.

The composition and ownership of the files of the proposed solution will look like this (arrows show dependencies)

The composition of the "classic" implementation


InternetTools.Types module


Since in this case both Delphi and Free Pascal are very similar, it is very reasonable to select such a common module containing the types used in the DLL export list in order not to duplicate their definition in the InternetToolsUsage application , which includes functional prototypes from the dynamic library:

 unit InternetTools.Types; interface type TXQHandle = Integer; implementation end. 

In this implementation, only one shy type is defined, but later on the module will “mature” and its utility will become unquestionable.

InternetTools Dynamic Library


The composition of the procedures and functions of the DLL is chosen minimal, but sufficient to accomplish the task set above:

 library InternetTools; uses InternetTools.Types; function OpenDocument(const URL: WideString): TXQHandle; stdcall; begin ... end; procedure CloseHandle(const Handle: TXQHandle); stdcall; begin ... end; function Map(const Handle: TXQHandle; const XQuery: WideString): TXQHandle; stdcall; begin ... end; function Count(const Handle: TXQHandle): Integer; stdcall; begin ... end; function ValueByIndex(const Handle: TXQHandle; const Index: Integer): WideString; stdcall; begin ... end; exports OpenDocument, CloseHandle, Map, Count, ValueByIndex; begin end. 

Due to the demonstration nature of the current implementation, the full code is not given - much more important is how this simplest API will be used further. Here, just do not forget about the requirement of thread safety, which, although it will require some effort, but will not be something complicated.

InternetToolsUsage application


Thanks to the previous preparations, it became possible to rewrite the example with lists in Delphi:

 program InternetToolsUsage; ... uses InternetTools.Types; const DLLName = 'InternetTools.dll'; function OpenDocument(const URL: WideString): TXQHandle; stdcall; external DLLName; procedure CloseHandle(const Handle: TXQHandle); stdcall; external DLLName; function Map(const Handle: TXQHandle; const XQuery: WideString): TXQHandle; stdcall; external DLLName; function Count(const Handle: TXQHandle): Integer; stdcall; external DLLName; function ValueByIndex(const Handle: TXQHandle; const Index: Integer): WideString; stdcall; external DLLName; const ArticleURL = 'https://habr.com/post/415617'; ListXPath = '//div[@class="post__body post__body_full"]//li[a]'; var RootHandle, ListHandle: TXQHandle; I: Integer; begin RootHandle := OpenDocument(ArticleURL); try ListHandle := Map(RootHandle, ListXPath); try for I := 0 to Count(ListHandle) - 1 do Writeln( ValueByIndex(ListHandle, I) ); finally CloseHandle(ListHandle); end; finally CloseHandle(RootHandle); end; ReadLn; end. 

If you do not take into account the prototypes of functions and procedures from the dynamic library, then you can’t say that the code has become catastrophic compared to the Free Pascal version, but what if we complicate the task a little and try to filter out some elements and output the addresses of the links contained in remaining:

 uses xquery; const ArticleURL = 'https://habr.com/post/415617'; ListXPath = '//div[@class="post__body post__body_full"]//li[a]'; HrefXPath = './a/@href'; var ListValue, HrefValue: IXQValue; begin for ListValue in xqvalue(ArticleURL).retrieve.map(ListXPath) do if {   } then for HrefValue in ListValue.map(HrefXPath) do Writeln(HrefValue.toString); end. 

It is possible to do this with the current API DLL, but the verbosity of the resulting is already very large, which not only greatly reduces the readability of the code, but also (and this is no less important) removes it from the above:

 program InternetToolsUsage; ... const ArticleURL = 'https://habr.com/post/415617'; ListXPath = '//div[@class="post__body post__body_full"]//li[a]'; HrefXPath = './a/@href'; var RootHandle, ListHandle, HrefHandle: TXQHandle; I, J: Integer; begin RootHandle := OpenDocument(ArticleURL); try ListHandle := Map(RootHandle, ListXPath); try for I := 0 to Count(ListHandle) - 1 do if {   } then begin HrefHandle := Map(ListHandle, HrefXPath); try for J := 0 to Count(HrefHandle) - 1 do Writeln( ValueByIndex(HrefHandle, J) ); finally CloseHandle(HrefHandle); end; end; finally CloseHandle(ListHandle); end; finally CloseHandle(RootHandle); end; ReadLn; end. 

Obviously, in real, more complex cases, the volume of what has been written will only grow rapidly, and therefore we will proceed to a solution that is free from such problems.

Interface implementation


The procedural style of working with the library, as just shown, is possible, but has significant drawbacks. Due to the fact that the DLL as such supports the use of interfaces (as received and returned data types), it is possible to organize work with InternetTools in the same convenient manner as when used with Free Pascal. At the same time, it is desirable to slightly change the composition of files in order to distribute the declaration and implementation of interfaces into separate modules:

The composition of the interface implementation files

As before, we will consistently consider each of the files.

InternetTools.Types module


Declares the interfaces to be implemented in a DLL:

 unit InternetTools.Types; {$IFDEF FPC} {$MODE Delphi} {$ENDIF} interface type IXQValue = interface; IXQValueEnumerator = interface ['{781B23DC-E8E8-4490-97EE-2332B3736466}'] function MoveNext: Boolean; safecall; function GetCurrent: IXQValue; safecall; property Current: IXQValue read GetCurrent; end; IXQValue = interface ['{DCE33144-A75F-4C53-8D25-6D9BD78B91E4}'] function GetEnumerator: IXQValueEnumerator; safecall; function OpenURL(const URL: WideString): IXQValue; safecall; function Map(const XQuery: WideString): IXQValue; safecall; function ToString: WideString; safecall; end; implementation end. 

Conditional compilation directives are necessary due to the use of the module in unchanged form in both Delphi and FPC projects.

The IXQValueEnumerator interface IXQValueEnumerator in principle optional, however, in order to be able to use loops like " for ... in ... " as an example , one cannot do without it; the second interface is the main one and is an analog wrapper over IXQValue from InternetTools (it is specifically made with the same name, so that it is easier to relate the future Delphi code to the library documentation on Free Pascal). If we consider the module in terms of design patterns, then the interfaces declared in it are adapters , albeit with a small feature — their implementation is located in the dynamic library.

The need to set the safecall call type for all methods is well described here . The obligation to use WideString instead of “native” strings will also not be justified, because the topic of exchanging dynamic data structures with a DLL is beyond the scope of the article.

InternetTools.Realization Module


The first one, both in importance and volume, is exactly he, as reflected in the title, will contain the interface implementation from the previous one: for both of them, the only class TXQValue , whose methods are so simple that almost all consist of one line of code (this is quite It is expected, because all the necessary functionality is already contained in the library - here you just need to refer to it):

 unit InternetTools.Realization; {$MODE Delphi} interface uses xquery, InternetTools.Types; type IOriginalXQValue = xquery.IXQValue; TXQValue = class(TInterfacedObject, IXQValue, IXQValueEnumerator) private FOriginalXQValue: IOriginalXQValue; FEnumerator: TXQValueEnumerator; function MoveNext: Boolean; safecall; function GetCurrent: IXQValue; safecall; function GetEnumerator: IXQValueEnumerator; safecall; function OpenURL(const URL: WideString): IXQValue; safecall; function Map(const XQuery: WideString): IXQValue; safecall; function ToString: WideString; safecall; reintroduce; public constructor Create(const OriginalXQValue: IOriginalXQValue); overload; function SafeCallException(ExceptObject: TObject; ExceptAddr: CodePointer): HResult; override; end; implementation uses sysutils, comobj, w32internetaccess; function TXQValue.MoveNext: Boolean; begin Result := FEnumerator.MoveNext; end; function TXQValue.GetCurrent: IXQValue; begin Result := TXQValue.Create(FEnumerator.Current); end; function TXQValue.GetEnumerator: IXQValueEnumerator; begin FEnumerator := FOriginalXQValue.GetEnumerator; Result := Self; end; function TXQValue.OpenURL(const URL: WideString): IXQValue; begin FOriginalXQValue := xqvalue(URL).retrieve; Result := Self; end; function TXQValue.Map(const XQuery: WideString): IXQValue; begin Result := TXQValue.Create( FOriginalXQValue.map(XQuery) ); end; function TXQValue.ToString: WideString; begin Result := FOriginalXQValue.toJoinedString(LineEnding); end; constructor TXQValue.Create(const OriginalXQValue: IOriginalXQValue); begin FOriginalXQValue := OriginalXQValue; end; function TXQValue.SafeCallException(ExceptObject: TObject; ExceptAddr: CodePointer): HResult; begin Result := HandleSafeCallException(ExceptObject, ExceptAddr, GUID_NULL, ExceptObject.ClassName, ''); end; end. 

It is worthwhile to dwell on the SafeCallException method - its overlap, by and large, is not vital (the TXQValue performance TXQValue not TXQValue at all), but the code given here allows you to pass the exception text to the Delphi-side that will occur in safecall methods (details again, can be found in a recent article already cited.

IXQValue , this solution is thread-safe - provided that IXQValue , obtained, for example, through OpenURL , is not transferred between threads. This is due to the fact that the implementation of the interface only redirects calls to the already thread-safe InternetTools.

InternetTools Dynamic Library


Because of the work done in the modules above, the DLL only needs to export a single function (compare with the variant where the procedural style was used):

 library InternetTools; uses InternetTools.Types, InternetTools.Realization; function GetXQValue: IXQValue; stdcall; begin Result := TXQValue.Create; end; exports GetXQValue; begin SetMultiByteConversionCodePage(CP_UTF8); end. 

The procedure call SetMultiByteConversionCodePage designed to work correctly with Unicode strings.

InternetToolsUsage application


If we now arrange the Delphi-solution of the original example on the basis of the proposed interfaces, then it will hardly differ from that on Free Pascal, which means that the task set at the very beginning of the article can be considered completed:

 program InternetToolsUsage; ... uses System.Win.ComObj, InternetTools.Types; const DLLName = 'InternetTools.dll'; function GetXQValue: IXQValue; stdcall; external DLLName; const ArticleURL = 'https://habr.com/post/415617'; ListXPath = '//div[@class="post__body post__body_full"]//li[a]'; var ListValue: IXQValue; begin for ListValue in GetXQValue.OpenURL(ArticleURL).Map(ListXPath) do Writeln(ListValue.ToString); ReadLn; end. 

The System.Win.ComObj module is not connected by accident - without it, the text of all safecall exceptions will become a faceless “Exception in safecall method”, and with it the original value generated in the DLL.

A slightly complicated example likewise has minimal differences on Delphi:

 ... const ArticleURL = 'https://habr.com/post/415617'; ListXPath = '//div[@class="post__body post__body_full"]//li[a]'; HrefXPath = './a/@href'; var ListValue, HrefValue: IXQValue; begin for ListValue in GetXQValue.OpenURL(ArticleURL).Map(ListXPath) do if {   } then for HrefValue in ListValue.Map(HrefXPath) do Writeln(HrefValue.ToString); ReadLn; end. 

Remaining library functionality


If you look at the full capabilities of the IXQValue interface from InternetTools, you will see that the corresponding interface from InternetTools.Types defines only 2 methods ( Map and ToString ) from the entire rich set; the addition of the rest, which the reader deems necessary in his particular case, is performed in exactly the same way and simple: the necessary methods are written in InternetTools.Types , after which they are added to the InternetTools.Realization module with code (most often as a single line).

If you want to use a slightly different functionality, for example, managing cookies, the sequence of steps is very similar:

  1. A new interface is InternetTools.Types in InternetTools.Types :

     ... ICookies = interface ['{21D0CC9A-204D-44D2-AF00-98E9E04412CD}'] procedure Add(const URL, Name, Value: WideString); safecall; procedure Clear; safecall; end; ... 
  2. Then it is implemented in the InternetTools.Realization module:

     ... type TCookies = class(TInterfacedObject, ICookies) private procedure Add(const URL, Name, Value: WideString); safecall; procedure Clear; safecall; public function SafeCallException(ExceptObject: TObject; ExceptAddr: CodePointer): HResult; override; end; ... implementation uses ..., internetaccess; ... procedure TCookies.Add(const URL, Name, Value: WideString); begin defaultInternet.cookies.setCookie( decodeURL(URL).host, decodeURL(URL).path, Name, Value, [] ); end; procedure TCookies.Clear; begin defaultInternet.cookies.clear; end; ... 
  3. After that, a new exported function is returned to the DLL, which returns this interface:

     ... function GetCookies: ICookies; stdcall; begin Result := TCookies.Create; end; exports ..., GetCookies; ... 

Resource Release


Although the InternetTools library is based on interfaces that imply automatic lifetime management, there is one unobvious nuance that would seem to lead to memory leaks - if you run the next console application (created in Delphi, nothing changes in the case of FPC), then each time you press the enter key, the memory consumed by the process will grow:

 ... const ArticleURL = 'https://habr.com/post/415617'; TitleXPath = '//head/title'; var I: Integer; begin for I := 1 to 100 do begin Writeln( GetXQValue.OpenURL(ArticleURL).Map(TitleXPath).ToString ); Readln; end; end. 

There are no errors with the use of interfaces. The problem is that InternetTools does not release its internal resources allocated when analyzing a document (in the OpenURL method) OpenURL needs to be done explicitly after it’s finished; for this purpose, the xquery library module provides the freeThreadVars procedure, which it is logical to call from the Delphi application by expanding the DLL export list:

 ... procedure FreeResources; stdcall; begin freeThreadVars; end; exports ..., FreeResources; ... 

After its activation, the loss of resources will stop:

 for I := 1 to 100 do begin Writeln( GetXQValue.OpenURL(ArticleURL).Map(TitleXPath).ToString ); FreeResources; Readln; end; 

It is important to understand the following: calling FreeResources causes all previously obtained interfaces to become meaningless and any attempts to use them are unacceptable.

Source: https://habr.com/ru/post/415617/


All Articles