Automatic search for information on the Internet. Overview of programs for searching documents and data

Useful tips

Finding the necessary and relevant information on the Internet is sometimes very difficult. The amount of information garbage on the Internet is growing like a snowball, and sometimes it is simply impossible to get to the data that you really need using traditional Yandex and Google. The book you are holding in your hands will increase the efficiency of your search for information on the Internet many times over. It describes techniques, search sites and programs for specialized information retrieval. Modern types of Internet search are considered: universal search, vertical search, metasearch systems, building personal search engines, searching for audiovisual content, searching on the hidden Internet. For all the systems considered, their characteristics and tips for maximum effective use are given.

Introduction

Internet search is an important element of working on the Internet. Hardly anyone knows for sure the exact number of web resources on the modern Internet. In any case, the count is in the billions. In order to be able to use the information needed at a given moment, no matter for work or entertainment purposes, you first need to find it in this constantly replenished ocean of resources. This is not an easy task at all, since information on the modern Internet is not structured, which creates problems in finding it. It is no coincidence that Internet search engines have become unique “windows” into this information space.

It is unlikely that among Internet users there will be people who have never used large universal search engines. The names Google, Yandex and a couple of other big machines are on everyone’s lips. They cope remarkably well with everyday Internet search tasks, and often users don’t even try to look for a replacement. At the same time, the number of Internet search engines in our time amounts to thousands. The reasons for such a variety of alternative machines have different roots. Some projects are trying to directly compete with global market leaders through careful work with national Internet resources. Others offer query capabilities not available from well-known search engines. A significant number of alternative engines specialize in searching for a certain topic area or a certain type of content, achieving impressive results in solving these problems. Be that as it may, the inclusion of such search engines in a user's own arsenal of Internet search tools can significantly improve its quality. However, there is one nuance here: you need to know about such machines and be able to use their capabilities.

We assume that readers of this book are already quite familiar with search techniques using universal search engines. It was so good that they felt the limitations associated with their use. Most likely, such people have already tried to look for and use certain additional tools. The printed word does not ignore the topic of Internet search: articles appear periodically and books are published. But their heroes, as a rule, are the same - several leading universal search engines. What makes this book different is that it attempts to cover the full range of modern search solutions. Here you will find descriptions and recommendations for using the best modern services aimed at solving the most common search problems. This book is for people who work a lot on the Internet and use the Network to find the information they need - be it business, study or hobby.

In order for an Internet search to be successful, two conditions must be met: queries must be well formulated and they must be asked in appropriate places. In other words, the user is required, on the one hand, to be able to translate his search interests into the language of the search query, and on the other hand, a good knowledge of search engines, available search tools, their advantages and disadvantages, which will allow him to choose the most suitable search tools in each specific case .

Currently, there is no single resource that satisfies all Internet search requirements. Therefore, if you take your search seriously, you inevitably have to use different tools, using each in the most appropriate case.

Chapter 1 Universal Internet search engines

Universal Internet search engines are the main and most famous means of Internet search. Such search engines provide maximum coverage of various resources. The largest and most popular search engines belong to the universal type. These are truly powerful solutions with a lot of features and tools that many users are often unaware of. Understanding the features and capabilities of universal search allows you to recognize the strengths and weaknesses of such systems and consciously choose the most effective search tools.

The market for universal search engines is quite large. In this chapter, we will consider only the most powerful machines that can adequately work with queries in Russian. The chapter opens with stories about the leaders of Russian search - the Google.ru and Yandex systems. Books and a lot of articles have been written about each of these search engines. We will focus on the main features that matter to the end user and also try to identify their strengths.

They are accompanied by a new search development from Microsoft - the Bing system, which has so far been noticeably neglected, as well as the useful and quite powerful search engine Exalead, the advantage of which is good support for searching in European Internet resources. This system is still a rare guest in the search arsenal of our users, so it is considered in more detail than the others.

In this chapter, when reviewing Google and Yandex systems, we will focus only on web search capabilities, and search in specialized databases of these projects is discussed in the following chapters devoted to image and video search. For other universal search engines, information about multimedia search is provided immediately upon introduction to them.

Since three of the four heroes of this chapter are of foreign origin, we immediately note that we are analyzing the capabilities of only their Russian versions. The fact is that some functions of foreign systems, especially experimental ones, are often available only in the original, usually English-language versions of the services.

Google

The Google search engine is deservedly considered the world leader in modern Internet search. Founded in 1998, Google remains one of the leading trendsetters in the field of Internet search and web services.

Google developers have always been distinguished by their increased attention to improving the algorithms of their search engine, as well as reasonable conservatism in the field of user interface. The capabilities of composing a query on Google can be called classic, and the methods of displaying search results have also become a kind of standard. Recently, Google developers have made serious changes in these areas - the largest search engine has begun to look too old-fashioned compared to its young competitors.

Google has one of the world's largest index databases, which provides a wide range of information sources. Google index information is consolidated into several vertical databases. In addition to the most well-known “Web” database, there are several multimedia databases (“Pictures”, “Video”) that work with sources of current information and messages on RSS feeds, the “News” database, as well as the “Blogs” database that indexes online diaries. In addition, Google offers a wide selection of additional resources, among which it is worth noting a mapping service, a website directory, and a question and answer service. These resources can also be thought of as search tools.

In the “Web” database, Google offers simple and advanced search modes for composing a query. In simple search mode, only the virtual keyboard is available among additional tools. Advanced search offers more options. Since the advanced search form is available in almost all Google search products, let’s look at it in more detail (Fig. 1.1).

Yandex

Officially presented to the general public in 1997, the Yandex search engine successfully developed and ten years later became one of the ten largest search engines in the world for the first time. In the Russian segment of the Internet, he has achieved a leading position, which he has no plans to relinquish yet, despite increasing competition. The distinctive features of Yandex since the beginning of its existence have been its own original algorithms for determining the relevance of search results, flexible tools for working with query text, and taking into account the peculiarities of the morphology of the Russian language when processing them.

Yandex relies on its own index databases. In addition to searching web documents, the system offers a good selection of specialized resources and additional services. Yandex currently works with images, videos, news, blogs and dictionaries. Powerful search capabilities are also included in our own map service and product search system. In addition, Yandex maintains its own website directory. Yandex's strength is its developed local search program, which is especially important for our users. Yandex provides third-party developers with access to its databases. As a result, many Russian alternative Internet search projects use Yandex resources in one way or another. In addition to the regular search system, a shortened version of Yandex is also offered, available at ya.ru. The interface of this version consists only of a query input field and a search button.

Web Document Search offers simple and advanced search modes. A simple search does not provide any filters, which is compensated by the ability to automatically parse queries in natural language, confident processing of relatively long queries, as well as a system for automatic query completion. The maximum length of a request is forty words.

The advanced search form only offers one field for making a request. It is suggested that logical operators connecting query words be entered manually, fortunately. Yandex has a fairly detailed query language. The remaining tools of the advanced search form are various filters (1.4).

Bing

The history of Internet search from Microsoft cannot be called simple. Algorithms, databases used and, of course, names have repeatedly changed on services consistently offered to the public. Until the early 2000s, the search engine did not have its own databases and worked with external indexes from AltaVista, Inktomi and Looksmart. The original name MSN Search was used until 2006, and then changing search engine names became a Microsoft tradition for several years.

Along with the final transition to searching in its own indexes, MSN Search was first renamed Windows LiveLive Search. Finally, in the early summer of 2009, Live Search was replaced by a new search project, Bing.

“Bing will allow you to take a different look at searching for information on the Internet and help users make important decisions,” was the beginning of Microsoft’s press release on the launch of Bing. The developers’ aspirations were clear: search engines from Microsoft, despite all their efforts, in the West were consistently inferior in popularity to the leaders - Google and Yahoo!. If we talk about the Russian-language versions of previous Microsoft search projects, then in terms of the quantity and quality of links found, they were much inferior to large Russian search engines. In an attempt to catch up with competitors, Bing's developers relied on improving search quality and introducing new technologies, many of which were acquired along with the companies that created them.

It should be noted that the Russian-language version of Bing, like most other localized versions, lacks a number of additional functions, such as shopping search. Since they, in fact, work only in the North. America, there is no point in dwelling on them in detail.

Exalead

One of the features of Europe, including in the field of Internet search, is the large number of national languages. A search engine that claims to be the leading one in Europe simply must index national segments of the Internet well and efficiently process queries in numerous European languages - both the largest and less common. It is in this area that European development can gain a serious competitive advantage over powerful overseas competitors. The Exalead system is currently seriously vying for the role of such a European search engine. This project was developed within the framework of the Quaere research program funded by the European Union.

Exalead has its own index databases. The main search resources of the system are databases of web documents, images, videos and news. The Exalead start page offers customization options. On this page you can place links to your favorite sites - they will be displayed in the form of graphic miniature screenshots. However, to do this you will have to register an account for free, and also allow your browser to store Exalead cookies.

Exalead Web Search offers simple and advanced search modes. The advanced search form, like in Bing, opens directly on the search results page. Note that Exalead offers not just a familiar form with a set of additional fields, but a complex drop-down menu that plays the role of a wizard for refining a query (Fig. 1.7). When you select one or another item in the wizard menu, new elements are added to the query string, and, if necessary, operators and special characters.

To say that in our time of information technology and the endless growth of the volume of data available to both an individual and society, there are many problems with processing information and searching for it is already blasphemy. Who doesn't raise this topic? And in order not to burden you with subjective and, in part, objective judgments drawn from various information sources regarding the problem, I will move directly to its solution. Today we'll talk about search. That is, about programs and serious information systems that search for the documents and data we need.

Upgrade "direct search"

Not so long ago, when the trees were large, and there was not much information even on the enterprise local network, any search was carried out by simply searching through a handful of available files and sequentially checking their names and contents. Such a search is called direct, and programs (utilities) using direct search technology are traditionally present in all operating systems and tool packages. But even the power of modern computers is not enough for a quick and adequate search in gigantic volumes of data during direct search. Searching through a couple of hundred documents on a disk and searching a huge library and several dozen mailboxes are two different things. Therefore, direct search programs today are clearly fading into the background - when it comes to universal tools.

Of course, this type of search has not been in demand for a long time in the corporate sector. The volumes are not the same. And, therefore, for many years now, and recently clearly, technologies capable of quickly and accurately searching for documents of various formats and from various sources are more than relevant. Not so long ago, Microsoft’s “father” Bill Gates, apparently envious of the phenomenal success of the Internet search engine Google, at one of the press conferences announced the desire of the software industry (and not only) to contribute in every possible way, develop and deepen the creation of search engines and technologies. But it’s too early to create any phenomenally working program from Microsoft or a competitive server on the Internet (MSN still doesn’t reach Google). Therefore, let's turn to existing developments. Index, query, relevance

Modern technologies are based on two fundamental processes. Firstly, it is indexing the available information and processing the request with subsequent output of the results. As for the first, any program (be it a desktop search engine, a corporate information system or an Internet search engine) creates its own search area. That is, it processes documents and generates an index of these documents (an organized structure that contains information about the processed data). In the future, it is the created index that is used for work - quickly obtaining a list of necessary documents according to the request. What follows, although by no means simple in terms of technology, is quite understandable to the average user. The program processes the request (using a keyword phrase) and displays a list of documents that contain this keyword phrase. Since the information is contained in a structured index, query processing is much faster (tens and hundreds of times!) than in the case of direct search (the selection of documents is carried out not by enumerating files, but by analyzing text information in the index).

The program displays the found documents in the resulting list according to relevance - the document's compliance with the query text. In different technologies, of course, there are different methods for searching and determining the relevance of a document (the number of “occurrences” of a word and its frequency of mention in the document, the ratio of these parameters to the total number of words in the document, the distance between the words of the query phrase in the searched files, and so on). Based on these parameters, the “weight” of the document is determined and, depending on it, a particular file appears in the list of results at a certain position. In the case of Internet search, the situation is even more complicated. Indeed, in this case, many other factors must be taken into account (Google’s Page Rank is an example of this). But this is a topic for a separate article, so we won’t touch the Internet. Review of search engines

This material examines the capabilities of several popular search programs that boast both decent speeds and good functionality. But showing off in brochures is one thing, but standing under the gaze of an expert is quite another. And there were no more experts, no less an office full of people who liked to tinker with the software for its usability. A set of programs was installed on the experimental computer (Athlon 2.2 MHz, with RAM 1 GB, 160 GB IDE hard drive Seagate 7200 rpm and Windows XP): dtSearch Desktop, Ishcheika Prof Deluxe, Google Desktop Search, SearchInform , Copernic Desktop Search, ISYS Desktop. For the tests, a text database of documents was compiled in doc, txt and html formats with a total size of neither more nor less, but 20 gigabytes. A group of comrades under the leadership of your humble servant tested, compared and shared their subjective impressions of each software. Read a summary of the findings below. dtSearch Desktop

A program that, according to the developers, claims to be the fastest, most convenient and best search engine. Like, in general, everyone else from this review. The dtSearch interface is quite simple, but some windows or tabs are somewhat overloaded with elements, which makes it seem difficult to use. But in reality there are no particular difficulties. The only really unpleasant point is the software’s lack of support for the Russian language (despite the fact that the program can search for documents in several languages, its interface is exclusively English).

But dtSearch is one of the few programs that can index web pages to a user-specified “depth” (albeit, taking into account the “additional purchase” of the dtSearch Spider add-on kit). This is in addition to supporting disk files of various text formats and emails from the Outlook mailbox. At the same time, the program cannot work with databases, which are such a tasty morsel for search engines due to the large volumes of information contained in them and their wide distribution in companies, and therefore in corporate networks. The speed of indexing dtSearch documents turned out to be at the proper level. Looking ahead, I will say that this program coped with the indexing of a given amount of information on a level with another competitor - iSYS - and shared second place with it in the list of the fastest systems. dtSearch indexed a test 20 gigabytes of information in 6 hours and 13 minutes, creating an index of 7.9 GB for subsequent search needs.

As for the search capabilities, here they are at the proper level. Firstly, dtSearch has a morphological search (searching for a word in all its morphological forms). Using this opportunity, you free yourself from, say, such thoughts as “in what case was a certain word used in the document I needed?” The use of morphological search is almost always justified, so it should be present in any professional search engine.

Search by sound is a non-standard feature even for professional search engines. Its essence is that the program will search for words that sound the same as the word you entered. And the best part is, this function also works for the Russian language! For example, when you type the word "ear" in a search query, you will see not only the words "ear" but also "ear" as a result.

Search with error correction is a very important function. It is used to search for words containing syntactic errors - these can be either typos or errors in documents obtained using character recognition systems, for example. A simple example - you are looking for the word keyboard. Some document contains the word “keyboard”, it is obvious that in fact this is the word “keyboard”, the person just made a typo when typing. So, an error correction search will detect and include a document with the word "keyboard" in the result. There is also a setting in dtSearch that allows you to determine the degree of possible erroneous characters.

Search using synonyms. This feature uses a list of synonyms for various words. So, for example, by entering the word “fast”, the program will also find the words “high-speed” and others that are synonyms for the word “fast”, if, of course, they are present in the list of synonyms. A ready-made list of synonyms is not supplied with the dtSearch program, however, it is possible to use lists on the Internet (accordingly, a connection is required, which is not always convenient), or you can create your own list of synonyms.

In addition to the listed capabilities, dtSearch can search using phrases consisting of words connected by logical operations. Each word in a query can be assigned its own “weight,” that is, significance. A useful option is to use a dictionary consisting of unimportant words in order to not take them into account when searching, but this dictionary is also empty and you will have to fill it out yourself.

Next, let's look at the program's capabilities when working on the network. In fact, dtSearch does not offer any specific capabilities for working with the network. However, it is quite possible to use it online. Alternatively, you can create some kind of index and put it in a public (shared) folder. The program itself can be installed on each user’s computer, or it can also be placed in a folder open for public access, and shortcuts can be created in a special way for each user separately, using command line parameters, the purpose of which is described in the help file supplied with the program. It is also possible to automatically install the program on the network using an MSI file. This will take into account the settings for each connected user.

In general, it is a good program from the category of professional search engines. It may qualify for a good rating, but gaining trust and respect from users may not be easy for dtSearch due to certain factors (not everything is smooth with the interface, Russian users are deprived, there are no bright features for working with the network). As for directly searching for documents, the program had no problems with Russian text. As there were none with the declared morphology, or with a fuzzy search. The system quite adequately found the necessary documents both by a simple one-word query and by using a couple of paragraphs or a document as a key phrase.

Official site:
Distribution size: 23 Mb Bloodhound Prof Deluxe

Based on the name, you can guess that there is support for the Russian language in this program. This is already nice. As for the interface, in general, it is somewhat unusual, but in appearance it is very attractive. Another thing is convenience. A very controversial criterion, but still, probably, a multi-window solution is not the most successful option (the request is entered in one window, the result is displayed in another, and the like).

Snoop uses the same indexes to perform a quick search, but indexing is much slower than other programs. This is very strange, especially considering that its capabilities for processing search queries are very weak, and therefore the index structure is not complex. Most likely, this is due to unoptimized algorithms. This program turned out to be a clear outsider in indexing and search speeds: the time spent creating the index was six times longer than the same dtSearch and iSYS. Indexing 20 gigabytes of texts for the bloodhound resulted in 38 hours and 46 minutes of work. And the created “search area” took up the same size on the hard drive as the original data with a small minus - 19 gigabytes.

Bloodhound can be presented as an alternative to the standard search in Windows; it is unlikely to be capable of more. The fact that the Snooper's primary task is the simplest search for files is indicated not only by the small number of functions for analyzing the text of search queries and an advanced search by file attributes, but even by a results window that provides direct links to the files found, as well as to the folders containing these files. The results window is not very informative in the sense that you can read the entire found file only by running it, that is, it does not have a built-in file viewer. But an excerpt from the file where the searched word was found is displayed; in general, this display scheme is very reminiscent of Internet search engines.

Speaking about specific capabilities for processing search queries, it is worth noting that there is no such thing as “search text”; the maximum that can be searched is a phrase, if only because there is no multi-line text input field. However, you can analyze the entered phrase, and Snoop offers us a standard search set here: logical operations, mask search and quote search... not a lot. The program contains some rudiments of morphological search, but it is probably so crude that it most likely interferes with correct operation (during tests, many bugs with incorrect use of morphology were noticed).

But the program allows you to specify file attributes when searching (document date, file name, folder name), and in these queries you can also use the same search set. You can also search for letters by specifying the parameters (From, Subject..., etc.).

So, we figured out the search itself, what else is interesting about the program, for which it received so many awards, according to information from the official website? It’s hard to say what’s so special about it; most likely, the Bloodhound interface is attractive (exactly in appearance, not to mention usability).

Operations with indexes are very standard; a nice feature is the ability to update indexes on a schedule. Additionally, indexes can also be used online. From now on we need more details.

Despite the primitiveness of search queries, the program can be used to search for files, so its use can be justified in networks. Although this is a stretch, since in a large network the priority is to quickly search for data using complex search queries due to the huge amount of information - and there are clearly problems with the speed of the search and the program. I must say that the work with the network at Izhishika is thought out as it should. A separate application is designed specifically for this - Bloodhound Server. It works the same way as simply Snooper (they have one search engine), only for documents located on a central server or on shared resources on the corporate network. Snooper Server creates new indexes on shared resources or uses previously created ones. Any user of the corporate network can connect to the Search Server and use it to access any document (located in the current index) using an Internet browser. Agree, this scheme is extremely convenient: it turns out that files on your own network can be searched in the same way as information on the Internet through, for example, Google.

Assessing all the advantages and disadvantages of this program, the conclusion suggests itself that its capabilities are most likely not enough for corporate networks (despite the good organization of working with the network), but for a home computer or even for a home network it is, in principle, , it might come up. Although neither the speed of work nor the search capabilities inspire optimism...

Official website in Russian:
Distribution size: 6 MbGoogle Desktop Search + GDS Enterprise

Of course, we couldn’t ignore such a famous developer. The name Google already says a lot. People who have been using the most powerful Internet search engine for years will certainly, without a single doubt, decide to install this particular search engine on their computer. Just think: Google on your home computer! However, without giving in to provocations with a widely promoted brand, let’s try soberly, and most importantly objectively, to consider the capabilities of the “desktop” search engine from Google.

The first thing that catches your eye is the lack of its own shell for the program. Google Desktop Search is still located in the browser window, respectively, the entire interface of the desktop version was inherited from the software from its older Internet brother. Whether this is good or bad is a moot point: some people like the minimalism in the design of this search engine, while others want to see a full-fledged application filled with all kinds of buttons and so on.

What catches your eye right after the design? And the fact that this same Google Desktop Search begins to index everything on the computer, without any demand! And what’s most interesting is that it is impossible to select indexing paths using Google Desktop Search. You will have to download a separate program (TweakGDS), which will allow you to somewhat expand the Google Desktop settings, including specifying the places necessary for indexing. Although, by the time you figure all this out, it will already index a standard hard drive, so this setting is more likely to be needed when working with large amounts of data, which is very important when used in corporate networks (Enterprise versions). However, it is not a fact that after downloading TweakGDS, your problems will be solved. After all, it requires the Microsoft .NET Framework and Microsoft Scripting Runtime to work. Yeah... the installation, as well as access to the settings, could have been made simpler, although the developers can probably understand: why write something new when there is a ready-made search engine, ported it to the local computer and let the user “enjoy” , and a famous name will make another masterpiece out of “this”. Come on, let's end this lyrical digression and move on to the search.

As for analyzing search queries and delivering results, everything here is absolutely identical to Google on the Internet: the same system for displaying results, the same standard set of logical operations for search queries. In general, Google Desktop Search, like the previous program, is designed exclusively for searching files - it, of course, does not have an internal viewer for these files. The number of file formats supported by Google Desktop Search is quite sufficient, and it is also nice that it searches visited Internet pages, taking data from the cache. Search and indexing speeds are quite acceptable. True, for home use. Google Desktop Search coped with an impressive 20 gigabytes of texts in 8 hours and 17 minutes. Spending several days processing information from the corporate network of a large enterprise is not something any system administrator would like to do. On the plus side: the size of the created index was on the same level (4.5 GB) as another search engine tested in this review - SearchInform.

The big advantage (or disadvantage - you decide) of Google Desktop Search is that it supports plugins, which can change a lot for the better. Another thing is that connecting plugins and setting them up complicates the task of installing a search engine so much that you begin to wonder whether all this is necessary when you can install a normal, full-fledged program in which everything will already be present. After all, to use each feature you will have to install a new plugin. Even in order for the program to fully work with archives, a separate gadget is needed. It’s fascinating and seductive that all these additional modules are free. However, if you do not take into account the desktop version of the search engine, then competent configuration of GDS Enterprise may not be within your power - after all, it is not for nothing that specialists from Google offer their services for setting up their own software for your network for only $10,000.

If you do go through the setup and installation procedure (or pay $10,000 to a quick response team from Google), you will understand that the complexity of the installation is more than compensated by the very flexible settings when used in corporate networks. An important aspect of using Google Desktop on a corporate network is the use of group policies, which makes it possible to set settings for each user.

To summarize, the most reasonable use for this program is a home or work computer. After all, for an ordinary computer, it’s enough just to install the program - it will do the rest itself (it won’t even ask you anything).

However, Google Desktop Search Enterprise will be acceptable in cases where there is an urgent need for flexible configuration of network policy to use the search engine, while the ability to process search queries will be in second place in importance, and the time (or money) spent on setting up the program will be in first place place.

Official site:
Distribution size including TweakGDS: 1.2 MbCopernic Desktop Search

Click on the picture to enlarge

The program interface evokes extremely positive emotions - everything is done in accordance with generally accepted standards, nothing superfluous, in a word, a pleasant design. For a beginner, understanding the Copernic Desktop Search interface will be very easy. Although, it is somewhat confusing that the designers clearly created the program interface taking into account the fact that the program will work in the standard Windows XP theme. When using the classic theme, the program does not look so nice. But this is more a matter of taste.

At the first launch, the program prompts you to create indexes for search. It seemed somewhat unusual that after selecting folders for indexing, the program did not offer to press any button, such as “Start indexing”, and indexing did not start automatically, only then it was noticed that Copernic was trying to start indexing while the computer was idle. You'll have to dig a little deeper into the program's options to configure everything properly. It should be noted that there are quite broad possibilities for setting up automatic index creation: a built-in scheduler, the ability to index while the computer is idle, in the background, with low priority. Indexing was not too fast - 10 hours 51 minutes - this is slower than in other search engines (except for Isle of Bloodhound, but Copernic is still an order of magnitude faster than the development of iSleuthHound Technologies.

Now about the structure of the index. In general, there is nothing special about it. It is possible to select file types, both in general and detailed form. That is, initially you can choose what you want to index - Documents, Images, Videos, Music. On the other tab of the options window, you will be able to select specific file types by extension. Additionally, you can configure the index so that, for example, pictures smaller than 16x16 in size are not indexed or sound files less than 10 seconds in length are not indexed. In addition to indexing files from folders, Copernic can work with emails and contacts from the address book of Microsoft Outlook and Microsoft Outlook Express, and it is possible to index Favorites and History from Internet Explorer.

As for the search capabilities, they are very weak here. During tests, it was even revealed that the program does not search for documents in txt and html formats in Russian, allowing you to find them only by titles, and not by content. The only thing the program provides to improve search efficiency is the use of a standard set of logical operations, and even then, this feature was discovered experimentally, since it was not documented. By the way, the program’s help is also not all right - it is only available via the Internet, which, you see, is very inconvenient, and there is not too much help information on the Internet. Apparently, the developers decided that the simple interface of the program does not imply the presence of normal help. Continuing the conversation about search capabilities, it should be noted that, despite the weak analysis of queries, the program provides an interesting search system - the user can select the type of files (images, videos, music, etc.), enter a search query and select attributes specific to selected file type. For example, for sound files, these can be values from mp3 tags (artist, album, date, etc.), for images, for example, you can select their size (by resolution), in general, each type has its own settings. After searching for a specific file type, the program will display a very informative list in the results window, and if your request includes files of other types, you can open them by clicking on a specific link.

Separately, it is worth mentioning the results display window. Below the list of found files, the contents of these files are displayed (a similar scheme is often used in email clients). True, text viewing can only be done in the native format, and there is no plain text display mode, which is not always convenient, since opening a document in this case takes more time. But, given that Copernic can search for images and music, it is possible to view these multimedia files.

The basic principles of operation of this program are described, now let's see what Copernic Desktop Search can offer us for working with the network... In principle, you can watch for a very long time, but you will hardly be able to see anything. In other words, this program was not intended to be network-based. Copernic Desktop Search is a home search engine exclusively.

Obviously, the only (most logical) application of this program is a home computer. Here it will fully cope with all simple user search queries consisting of one or two words, will find the necessary information, and the division of search by file type and support for multimedia files along with background indexing in low priority mode, coupled with a pleasant interface, only give the program strength to gain trust among inexperienced users.

Official site
Distribution size: 2.6 MbISYS Desktop

Click on the picture to enlarge

A very powerful program. In terms of its level of equipment with all sorts of functions, it is somewhere close to the next SearchInform search system on the list. Moreover, the size of the installation file is more than 40Mb! It’s hard to say what could be squeezed into such dimensions, because the same SearchInform, with similar functionality, takes up 15Mb.

The installation process here is also not very pleasant, or rather not even the installation process. Even before downloading the program, you will be asked to register, otherwise there is no way. Next, the interface. It is made very nicely, nothing unnecessary catches the eye, however, these are the impressions of a person who is already somewhat accustomed to it. It will not be easy for a beginner to figure out where and what is located, where to click and where to finally search. It is highly recommended to read the help before starting work - you will save a lot of nerves and time. Added to everything else is the complete lack of support for the Russian language in the program. Not good. In addition, the windows here are not overloaded with controls, but we had to pay for this with multi-modules and the use of additional windows. For example, search queries are entered by launching one program, and index management is performed using another program. Search queries are also entered here in separate pop-up windows. It’s hard to say which is better - an overloaded interface or ubiquitous multi-windows; rather, it’s a matter of taste.

When it comes to creating indexes, the program provides features to simplify the process of setting options for a new index. These features include several ready-made templates for creating indexes for the folder “My Documents”, “Mail”, “Mail and Documents”, “Specific Folder”, “Folder with a selection of file types”, etc. Such templates simplify the creation of indexes on the first stage. The utility for working with indexes does not have a very good interface, which is intimidating with some complexity (this is a very subjective assessment, to be honest), however, if you look at it, it provides many useful options and, in general, its use does not cause much difficulty. ISYS Desktop can index data from various data sources, and also provides many flexible settings for such indexing. Additional indexing features include: support for SQL, FTP, TRIM Context, WORLDOX 2002, scripts. When creating an index, if you selected the "Folder with selection of file types" item, you have the opportunity to select file types for indexing manually (by extension). It must be said that there are simply a huge number of supported file types, but you will not be able to add your own type (extension) to the existing list. You can also note the presence of an indexing scheduler. Creating an index and processing 20 gigabytes of information took ISYS Desktop 6 hours and 13 minutes, ultimately showing a good time and the size of the created file - 7.9 GB.

The search capabilities of this program are quite good. What is used in ISYS is much more powerful than conventional support for logical operations. Among the advanced search capabilities, the program offers the use of synonyms and a sorting filter (by path, name and date of file creation). The set of logical operators is somewhat wider than the standard set. In addition to logical operations, the program allows you to work with many other operators, which, in principle, can replace some types of search; for example, search with parsing can be completely replaced by using special operators. I was very surprised that the program does not have a search using morphology. This is a serious omission, since search efficiency is greatly improved when using morphological analysis. In addition, there is no list of significant words, but there is an extensive list of insignificant words. Search functions such as “approximate search” and “heuristic analysis” are also announced.

ISYS provides a choice of several types of search queries, namely visual ones. This is done using different types of windows for entering search queries, however, in fact, not a single window allows the use of technologies other than those listed above.

The search results are very informative and are displayed as a list of documents sorted by relevance. A preview of the selected document is displayed below. Unlike Copernic Desktop Search, preview here is available only in the form of plain text; it was not possible to display documents in their native format, be it Word, Html or PDF, although this, in principle, is not too critical. The program allows you to divide found documents into groups according to certain criteria (by default they are divided by relevance). You can also view already found documents by selecting individual folders (this is convenient when the result produces a very large number of documents).

Using the program on a corporate network is also very justified, since it provides good opportunities for organizing network search. The search system is based on the creation of a public index that contains indexed data from publicly available online resources.

In fact, the program from ISYS is worthy of attention, at least getting acquainted with it. This program is a mature project with a huge number of functions (not always and not everyone, of course, needs them, but still). The chances that the program will see some improvements in terms of processing search queries are unknown, but at the moment it can be recommended for almost universal use. And given that it is still too heavy for home systems, the main places for its installation are corporate networks.

Official site:
Distribution size: 40 MbSearchInform

Click on the picture to enlarge

It’s probably not worth starting right away with a description of the SearchInform interface. We should first describe the installation process, or rather one of its details: you cannot install the program without an Internet connection. The fact is that before the first launch, the program requires user registration (free) and sends all entered data to the server. Apparently, the developers had to take such measures in the fight against piracy, but this did not have a positive effect on the ease of installation.

The program interface is designed in compliance with all generally accepted rules, however, at first glance, it is somewhat cumbersome. Using the program for the first time, it seems that it is too complicated, sometimes it is not easy to remember in which menu or on which tab the desired option is located, however, with longer use, the interface no longer seems so terribly complicated. The main thing is to read the certificate first.

Having understood the interface a little, you can start creating an index. The process itself is very simple and the indexing speed, even by eye, is significantly higher than all other search engines in the review. Clear test numbers show that SearchInform is twice as fast as dtSearch and iSYS in terms of indexing speed! The program indexed the provided data in the amount of 20 gigabytes in a record time of 3 hours 17 minutes. And the size of the created index turned out to be the smallest 4.4 GB - 100 megabytes less than Google Desktop Search.

The program supports, in addition to regular files and folders, also indexing emails, connecting and indexing databases (!) and other external sources (DMS, CRM), immediately during indexing you can specify a dictionary for conducting a morphological search, and all attributes can be indexed files. After creating the index, when trying to conduct the first test search for documents, you may become somewhat confused: “there are two types of search here, but which one do I need?” As mentioned earlier, the main thing is to read the help, then everything will become clear. The program can actually carry out two types of searches - phrase search and search for documents similar in content to the query text.

A description of all the main functions for analyzing a search query was given above, so now we will only list the search capabilities provided by this program. Let's start with phrase search: of course, morphological search, citation search, logical operations, search with word parsing (search at the beginning of the word, at the end, at the middle part, or a complete match), mixed citation search (when all words from the query must be present in the document, but not necessarily in the entered order), search with error correction, use of synonyms, “almost citation search” (search for the entered phrase as a citation, but other words may be present between the entered words), etc. Some of the options listed have their own specific settings. In addition, it is possible to use a dictionary of unimportant words, and the program already has a ready-made list of these words; you can also use a dictionary of priority words for searching (of course, you will have to fill it out yourself).

Here, in principle, we briefly reviewed all the main features of phrase search.

Let's move on to consider the features of this program - searching for similar documents. The developers claim that this is by no means a simple text search, it is precisely a “search for similar ones” - this is exactly how it is described everywhere, but oh well, you can call it whatever you want - the main point is. A quick search on the Internet can quickly reveal that so-called "similar search" is a new development in the field of text analysis. This system allows you to find texts that are similar in semantic content. The most pleasant thing was that after conducting test search queries, it turned out that the theory coincides quite well with practice! The program actually searches for documents with similar content and displays them in a list, sorting them by percentage of similarity.

Next, let's look at what SearchInform (in particular, its corporate version SearchInform Corporate) offers for working on a corporate network. There are two types of applications: server side and user side. The server part independently processes the specified indexes, and users can use them for search, depending on the access rights assigned to them. Users can be configured automatically using Windows accounts (in professional terms, SearchInform uses NTFS Windows authentication) or manually (users will have to be added separately). Each user can be allowed or denied access to certain indexes, and users can also be combined into groups. In general, SearchInform’s settings for working on the network are ahead of Google in terms of flexibility, and Ishhound Server in terms of convenience and simplicity.

Official site:
Distribution size: 14.7 Mb Comparison of indexing speeds

Search system	Indexing time	Index size
Bloodhound Prof Deluxe 4.5	38 hours 46 minutes	19 GB
Isys Desktop 7.0	6 hours 13 minutes	7.9 GB
DtSearch 7.0	6 hours 3 minutes	8.6 GB
Google Desktop Search Enterprise	8 hours 17 minutes	4.5 GB
Copernic Desktop Search *	10 hours 51 minutes	7 GB
SearchInform 1.5.02	3 hours 17 minutes	4.4 GB

* Most of the documents.html and .txt containing Russian text, although they were indexed, were impossible to find except by their names. Summary

All programs are worthy of attention.

Based on tests and a careful examination of each program presented in the review, certain conclusions can be drawn. So, Google Desktop Search Copernic Desktop Search is quite suitable for the inexperienced user as a home information search system. They cope well with simple queries, do not overload the user with settings and, moreover, are completely free. Google's attempt to enter the corporate search engine market is not yet very justified: for it to work properly, the program needs to be equipped with additional modules, and it is far from easy to set up. Therefore, the self-explanatory names Desktop Search, Copernic, and Google reserve behind them the niche of “desktop” search engines.

True, more powerful solutions - dtSearch, iSYS and SearchInform are also not foolproof and offer users their “desktop” versions. But at a reasonable price, unlike free software from Google and Copernic. Of course, you have to pay for power, speed and functionality. But the main focus of the developers of dtSearch, iSYS and SearchInform is, of course, on the corporate sector. Networking, functionality, indexing and search speed are what distinguish these products from their “competitors.” Based on the test results, the favorite was identified - SearchInform. The program provides the ability to search for similar documents, has the fastest indexing and search speeds, and has a good set of functions.

The machines must work.
People must think.

The “Professional Internet Search” course is a convenient way to learn how to competently and effectively search and find the necessary information on the Internet.

What's happened professional search?

Internet paradox is that information becomes more and more More, but find necessary information becomes it's getting harder. Professional search is efficient search necessary And reliable information.
In the modern world, information becomes capital, and the Internet becomes a convenient means of obtaining it, which is why the ability to find valuable information characterizes a person as high class professional. A professional search should always be effective. Moreover, during the search, professionals not only look for the place where the information is stored, but also evaluate the authority of the resource, relevance, accuracy, and completeness of the published information. Internet heuristics help us with this - a set of useful search rules, selection criteria and evaluation of network information.

What will you learn and what will you learn?

Have you been looking and couldn't find it? Then the course will be extremely useful to you. You'll get comprehensive search instructions something that is already on the Internet, but at first glance it seems that it is simply impossible to find it... Perhaps! You will learn, how to search to find! Each lesson is based on a combination of knowledge and experience, all received knowledge is tested in action.

During the course classes You will learn, how the modern Internet is developing and how electronic information is distributed, how catalogs are created and how search engines work, why metasearch engines are needed and where the “hidden” web came from, how forums differ from blogs and what fundraising is.

During workshops You will learn correctly use the query language, correctly select keywords, find information on the “hidden” web, find the necessary images and files, evaluate public opinion in the blogosphere, look for personal information, and most importantly - correctly evaluate the reliability, relevance and completeness of the information found.

The Internet search course will allow you to significantly develop your cognitive, information and communication abilities.

What topics are covered in the Professional Search course?

The goal of the course is to teach in one month the possibilities and subtleties of modern search for professional information on the Internet.

Each lesson (module) includes lecture, seminar in a forum format, test to master the material covered, as well as several exercises and search tasks.

The updated course will feature weekly one-hour webinars - interactive virtual online seminars dedicated to discussing the key tasks of professional Internet search.

Each training module is equipped useful additional materials on course topics and handouts convenient for printing.

The thematic plan of the course consists of 10 interrelated modules:

1. Internet: history, technology and Internet research.

2. Information search. Search directories.

3. Information retrieval systems. IPS close-up (Google, Yandex and others).

4. Metasearch engines and programs.

5. Internet Help Desk: factual search in encyclopedias, reference books, dictionaries.

6. Bibliographic search: libraries, catalogs, programs.

7. Documentary search: electronic documents, electronic libraries, electronic journals.

8. "Hidden" Web: Search multimedia, databases, knowledge bases and files.

9. Search news(blogs and forums), contacts, institutions, fundraising.

10. Information Retrieval Strategies: Generalization of Internet heuristics skills.

Why is the course distance learning?

The distance course has a whole several advantages.

Firstly, each lesson is allocated not one or two academic hours per week, but whole week. You can master and assimilate lecture material, perform exercises and search tasks without haste.

Secondly, distance learning course interactive. This means that you can always ask, clarify, find out from the teacher what you think is important. Your question will not go unanswered, and complex search tasks can be discussed as a group to evaluate each skill in comparison.

Thirdly, you can study at a time convenient for you and you won’t have to waste time traveling to classes. Moreover, you can study anywhere in the world where there is access to the Internet.

What is par for the course?

The “Internet heuristics” course will last one month and will consist of 10 modules, each module consists of “quanta” lessons - they allow you to maintain the pace necessary for mastering new material). Price of each module – only 300 rubles, for all classes you will pay only 3000 rubles. Please note that you do not have to buy additional textbooks; the course is fully provided with all the necessary educational materials. If you successfully complete the course, you will receive a Moscow State University certificate for completing the “Professional Internet Search” course.

If you want to learn Internet resourcefulness, then you need to choose a convenient time to take the course and sign up (just click on the sign up link opposite the convenient time slot at the top of the page)!

After registration, you will still have time to think and make a final decision. By the way, you can meet

Professional Internet search requires specialized software, as well as specialized search engines and search services.

PROGRAMS

http://dr-watson.wix.com/home – the program is designed to study arrays of text information in order to identify entities and connections between them. The result of the work is a report on the object under study.

http://www.fmsasg.com/ - one of the best programs in the world for visualizing connections and relationships Sentinel Vizualizer. The company has completely Russified its products and connected a hotline in Russian.

http://www.newprosoft.com/ – “Web Content Extractor” is the most powerful, easy-to-use software for extracting data from web sites. It also has an effective Visual Web spider.

SiteSputnik – a software package that has no analogues in the world, allowing you to search and process its results on the Visible and Invisible Internet, using all the search engines necessary for the user.

WebSite-Watcher – allows you to monitor web pages, including password-protected ones, monitoring forums, RSS feeds, news groups, local files. Has a powerful filter system. Monitoring is carried out automatically and is delivered in a user-friendly form. A program with advanced functions costs 50 euros. Constantly updated.

http://www.scribd.com/ is the most popular platform in the world and increasingly used in Russia for posting various kinds of documents, books, etc. for free access with a very convenient search engine for titles, topics, etc.

http://www.atlasti.com/ is the most powerful and effective tool for qualitative information analysis available to individual users, small and even medium-sized businesses. The program is multifunctional and therefore useful. It combines the ability to create a unified information environment for working with various text, tabular, audio and video files as a single whole, as well as tools for qualitative analysis and visualization.

Ashampoo ClipFinder HD – an ever-increasing share of the information flow comes from video. Accordingly, competitive intelligence officers need tools that allow them to work with this format. One such product is the free utility we present. It allows you to search for videos based on specified criteria on video file storage sites such as YouTube. The program is easy to use, displays all search results on one page with detailed information, titles, duration, time when the video was uploaded to the storage, etc. There is a Russian interface.

http://www.advego.ru/plagiatus/ – the program was made by SEO optimizers, but is quite suitable as an Internet intelligence tool. Plagiarism shows the degree of uniqueness of the text, the sources of the text, and the percentage of text match. The program also checks the uniqueness of the specified URL. The program is free.

http://neiron.ru/toolbar/ – includes an add-on for combining Google and Yandex search, and also allows for competitive analysis based on assessing the effectiveness of sites and contextual advertising. Implemented as a plugin for FF and GC.

http://web-data-extractor.net/ is a universal solution for obtaining any data available on the Internet. Setting up data cutting from any page is done in a few mouse clicks. You just need to select the data area that you want to save and Datacol will automatically select a formula for cutting out this block.

CaptureSaver is a professional Internet research tool. Simply an indispensable working program that allows you to capture, store and export any Internet information, including not only web pages, blogs, but also RSS news, email, images and much more. It has the widest functionality, an intuitive interface and a ridiculous price.

http://www.orbiscope.net/en/software.html – web monitoring system at more than affordable prices.

http://www.kbcrawl.co.uk/ – software for working, including on the “Invisible Internet”.

http://www.copernic.com/en/products/agent/index.html – the program allows you to search using more than 90 search engines, using more than 10 parameters. Allows you to combine results, eliminate duplicates, block broken links, and show the most relevant results. Comes in free, personal and professional versions. Used by more than 20 million users.

Maltego is a fundamentally new software that allows you to establish the relationship of subjects, events and objects in real life and on the Internet.

SERVICES

new – web browser with dozens of pre-installed tools for OSINT.

– an effective search engine-aggregator for finding people on major Russian social networks.

https://hunter.io/ is an effective service for detecting and checking email.

https://www.whatruns.com/ is an easy to use yet effective scanner to discover what is working and not working on a website and what its security holes are. Also implemented as a plugin for Chrom.

https://www.crayon.co/ is an American budget platform for market and competitive intelligence on the Internet.

http://www.cs.cornell.edu/~bwong/octant/ – host identifier.

https://iplogger.ru/ – a simple and convenient service for determining someone else’s IP.

http://linkurio.us/ is a powerful new product for economic security workers and corruption investigators. Processes and visualizes huge amounts of unstructured information from financial sources.

http://www.intelsuite.com/en – English-language online platform for competitive intelligence and monitoring.

http://yewno.com/about/ is the first operating system for translating information into knowledge and visualizing unstructured information. Currently supports English, French, German, Spanish and Portuguese.

https://start.avalancheonline.ru/landing/?next=%2F – forecasting and analytical services by Andrey Masalovich.

https://www.outwit.com/products/hub/ – a complete set of stand-alone programs for professional work in web 1.

https://github.com/search?q=user%3Acmlh+maltego – extensions for Maltego.

http://www.whoishostingthis.com/ – search engine for hosting, IP addresses, etc.

http://appfollow.ru/ – analysis of applications based on reviews, ASO optimization, positions in tops and search results for the App Store, Google Play and Windows Phone Store.

http://spiraldb.com/ is a service implemented as a plugin for Chrom, which allows you to get a lot of valuable information about any electronic resource.

https://millie.northernlight.com/dashboard.php?id=93 - a free service that collects and structures key information on industries and companies. It is possible to use information panels based on text analysis.

http://byratino.info/ – collection of factual data from publicly available sources on the Internet.

http://www.datafox.co/ – CI platform collects and analyzes information on companies of interest to clients. There is a demo.

https://unwiredlabs.com/home - a specialized application with an API for searching by geolocation of any device connected to the Internet.

http://visualping.io/ – a service for monitoring sites and, first of all, the photographs and images available on them. Even if the photo only appears for a second, it will be in the subscriber's email. Has a plugin for Google Chrome.

http://spyonweb.com/ is a research tool that allows for in-depth analysis of any Internet resource.

http://bigvisor.ru/ – the service allows you to track advertising campaigns for certain segments of goods and services, or specific organizations.

http://www.itsec.pro/2013/09/microsoft-word.html – instructions from Artem Ageev on using Windows programs for competitive intelligence needs.

http://granoproject.org/ is an open source tool for researchers who track networks of connections between individuals and organizations in politics, economics, crime, etc. Allows you to connect, analyze and visualize information obtained from various sources, as well as show significant connections.

http://imgops.com/ – a service for extracting metadata from graphic files and working with them.

http://sergeybelove.ru/tools/one-button-scan/ – a small online scanner for checking security holes in websites and other resources.

http://isce-library.net/epi.aspx – service for searching primary sources using a fragment of text in English

https://www.rivaliq.com/ is an effective tool for conducting competitive intelligence in Western, primarily European and American markets for goods and services.

http://watchthatpage.com/ is a service that allows you to automatically collect new information from monitored Internet resources. The service is free.

http://falcon.io/ is a kind of Rapportive for the Web. It is not a replacement for Rapportive, but provides additional tools. In contrast, Rapportive provides a general profile of a person, as if glued together from data from social networks and mentions on the web. http://watchthatpage.com/ - a service that allows you to automatically collect new information from monitored resources on the Internet. The service is free.

https://addons.mozilla.org/ru/firefox/addon/update-scanner/ – add-on for Firefox. Monitors web page updates. Useful for websites that do not have news feeds (Atom or RSS).

http://agregator.pro/ – aggregator of news and media portals. Used by marketers, analysts, etc. to analyze news flows on certain topics.

http://price.apishops.com/ – automated web service for monitoring prices for selected product groups, specific online stores and other parameters.

http://www.la0.ru/ is a convenient and relevant service for analyzing links and backlinks to an Internet resource.

www.recordedfuture.com is a powerful tool for data analysis and visualization, implemented as an online service built on cloud computing.

http://advse.ru/ is a service with the slogan “Find out everything about your competitors.” Allows you to obtain competitors' websites in accordance with search queries and analyze competitors' advertising campaigns in Google and Yandex.

http://spyonweb.com/ – the service allows you to identify sites with the same characteristics, including those using the same Google Analytics statistics service identifiers, IP addresses, etc.

http://www.connotate.com/solutions – a line of products for competitive intelligence, managing information flows and converting information into information assets. It includes both complex platforms and simple, cheap services that allow for effective monitoring along with information compression and obtaining only the necessary results.

http://www.clearci.com/ - competitive intelligence platform for businesses of various sizes from start-ups and small companies to Fortune 500 companies. Solved as saas.

http://startingpage.com/ is a Google add-on that allows you to search on Google without recording your IP address. Fully supports all Google search capabilities, including in Russian.

http://newspapermap.com/ is a unique service that is very useful for a competitive intelligence officer. Connects geolocation with an online media search engine. Those. you select the region you are interested in, or even a city, or language, see the place on the map and a list of online versions of newspapers and magazines, click on the appropriate button and read. Supports Russian language, very user-friendly interface.

http://infostream.com.ua/ is a very convenient news monitoring system “Infostream”, distinguished by a first-class selection and quite accessible to any wallet, from one of the classics of Internet search, D.V. Lande.

http://www.instapaper.com/ is a very simple and effective tool for saving the necessary web pages. Can be used on computers, iPhones, iPads, etc.

http://screen-scraper.com/ – allows you to automatically extract all information from web pages, download the vast majority of file formats, and automatically enter data into various forms. It saves downloaded files and pages in databases and performs many other extremely useful functions. Works on all major platforms, has fully functional free and very powerful professional versions.

http://www.mozenda.com/ - has several tariff plans and is accessible even to small businesses, a web service for multifunctional web monitoring and delivery of information necessary for the user from selected sites.

http://www.recipdonor.com/ - the service allows you to automatically monitor everything that happens on competitors' websites.

http://www.spyfu.com/ – and this is if your competitors are foreign.

www.webground.su is a service for monitoring the Runet created by Internet search professionals, which includes all the major providers of information, news, etc., and is capable of individual monitoring settings to suit the user’s needs.

SEARCH ENGINES

https://www.idmarch.org/ is the best search engine for the world archive of pdf documents in terms of quality. Currently, more than 18 million pdf documents have been indexed, ranging from books to secret reports.

http://www.marketvisual.com/ is a unique search engine that allows you to search for owners and top management by full name, company name, position, or a combination thereof. The search results contain not only the objects you are looking for, but also their connections. Designed primarily for English-speaking countries.

http://worldc.am/ is a search engine for freely accessible photographs linked to geolocation.

https://app.echosec.net/ is a public search engine that describes itself as the most advanced analytical tool for law enforcement and security and intelligence professionals. Allows you to search for photos posted on various sites, social platforms and social networks in relation to specific geolocation coordinates. There are currently seven data sources connected. By the end of the year their number will be more than 450. Thanks to Dementy for the tip.

http://www.quandl.com/ is a search engine for seven million financial, economic and social databases.

http://bitzakaz.ru/ – search engine for tenders and government orders with additional paid functions

Website-Finder - makes it possible to find sites that Google does not index well. The only limitation is that it only searches 30 websites for each keyword. The program is easy to use.

http://www.dtsearch.com/ is a powerful search engine that allows you to process terabytes of text. Works on desktop, web and intranet. Supports both static and dynamic data. Allows you to search in all MS Office programs. The search is carried out using phrases, words, tags, indexes and much more. The only federated search engine available. It has both paid and free versions.

http://www.strategator.com/ – searches, filters and aggregates information about the company from tens of thousands of web sources. Searches in the USA, Great Britain, major EEC countries. It is highly relevant, user-friendly, and has free and paid options ($14 per month).

http://www.shodanhq.com/ is an unusual search engine. Immediately after his appearance, he received the nickname “Google for hackers.” It does not search for pages, but determines IP addresses, types of routers, computers, servers and workstations located at a particular address, traces chains of DNS servers and allows you to implement many other interesting functions for competitive intelligence.

http://search.usa.gov/ is a search engine for websites and open databases of all US government agencies. The databases contain a lot of practical, useful information, including for use in our country.

http://visual.ly/ – today visualization is increasingly used to present data. This is the first infographic search engine on the Web. Along with the search engine, the portal has powerful data visualization tools that do not require programming skills.

http://go.mail.ru/realtime – search for discussions of topics, events, objects, subjects in real or customizable time. The previously highly criticized search in Mail.ru works very effectively and provides interesting, relevant results.

Zanran is just launched, but already working great, the first and only data search engine that extracts data from PDF files, EXCEL tables, data on HTML pages.

http://www.ciradar.com/Competitive-Analysis.aspx is one of the world's best information retrieval systems for competitive intelligence on the deep web. Retrieves almost all types of files in all formats on the topic of interest. Implemented as a web service. The prices are more than reasonable.

http://public.ru/ – Effective search and professional analysis of information, media archive since 1990. The online media library offers a wide range of information services: from access to electronic archives of Russian-language media publications and ready-made thematic press reviews to individual monitoring and exclusive analytical research based on press materials.

Cluuz is a young search engine with ample opportunities for competitive intelligence, especially on the English-language Internet. Allows you not only to find, but also to visualize and establish connections between people, companies, domains, e-mails, addresses, etc.

www.wolframalpha.com – the search engine of tomorrow. In response to a search request, it provides statistical and factual information available on the request object, including visualized information.

www.ist-budget.ru – universal search in databases of government procurement, tenders, auctions, etc.

Introduction

Currently, the Internet unites hundreds of millions of servers that host billions of different sites and individual files containing various types of information. This is a giant repository of information. There are various methods for searching information on the Internet.

Search by known address. The necessary addresses are taken from directories. Knowing the address, just enter it into the address bar of the Browser.

Example 1. www.gov.ru is a server of Russian government authorities.

Constructing an address by the user. Knowing the system for forming Internet addresses, you can construct addresses when searching for Web sites.

To the keyword (the name of a company, enterprise, organization or a simple English noun), you need to add a thematic or geographic domain, and you need to connect your intuition.

Example 2. Commercial Web page addresses:

www.samsung.com (SAMSUNG company),

www.mtv.com (MTV music news).

Example 3. Addresses of educational institutions:

www.ntu.edu (US National University).

Internet search engines

Special information retrieval systems have been developed to search for information on the Internet. Search engines have a regular address and are displayed as a Web page containing special tools for organizing searches (search string, subject directory, links). To call a search engine, simply enter its address in the address bar of the Browser.

According to the statistics service LiveInternet.ru, the distribution of search engines in Russia is approximately as follows:

2) Google – 35.0%

3) Search Mail.ru – 8.3%

4) Rambler – 0.9%

According to the method of organizing information, information retrieval systems are divided into two types: classification (rubricators) and dictionary.

Categories (classifiers)- search engines that use a hierarchical (tree) organization of information. When searching for information, the user looks through thematic headings, gradually narrowing the search field (for example, if you need to find the meaning of a word, you first need to find a dictionary in the classifier, and then find the desired word in it).

Dictionary search engines- These are powerful automatic software and hardware systems. With their help, information is viewed (scanned) on the Internet. Data on the location of this or that information is entered into special index directories. In response to a request, a search is performed according to the query string. As a result, the user is offered those addresses (URLs) where the searched word or group of words was found at the time of scanning. By selecting any of the proposed link addresses, you can go to the found document. Most modern search engines are mixed.

The most famous and popular search engines:

There are systems that specialize in searching for information resources in various areas.

https://my.mail.ru

https://ru-ru.facebook.com

https://twitter.com

https://www.tumblr.com

https://www.instagram.com, etc.

Subject search engines:

Search software:

Catalogs (thematic collections of links with annotations):

http://www.atrus.ru

Rules for executing requests

Each search engine's Help section provides information on how to search and how to construct a query string. Below is information about a typical, “average” query language.

Simple request

Enter one word that defines the search topic. For example, in the search engine Rambler.ru it is enough to enter: automation.

Documents are found that contain the words specified in the request. All forms of Russian words are recognized; as a rule, letter case is ignored.

You can use the "*" or "?" character in the query. Sign "?" in a keyword, one character is replaced, in place of which any letter can be substituted, and the “*” sign is a sequence of characters.

For example, the query automatic* will allow you to find documents that include the words automatic, automation, etc.

Complex request

There is often a need to combine keywords to obtain more specific information. In this case, additional linking words, functions, operators, symbols, combinations of operators, separated by brackets, are used.

For example, the query music & (beatles beatles) means that the user is looking for documents containing the words music and beatles or music and beatles.

List of search engines and directories


Address	Description
www.excite.com	Search engine with site reviews and guides
www.alta-vista.com	Search server, advanced search capabilities available
www.hotbot.com	Search server
www.ifoseek.com	Search server (easy to use)
www.ipl.org	Internet Publik library, a public library operating within the framework of the World Village project
www.wisewire.com	WiseWire - search organization using artificial intelligence
www.webcrawler.com	WebCrawler - search server, easy to use
www.yahoo.com	CatalogWeb and interface for accessing full-text search on the AltaVista server
www.aport.ru	Aport - Russian-language search server
www.yandex.ru	Yandex - Russian-language search server
www.rambler.ru	Rambler - Russian-language search server
Internet Help Resources
www.yellow.com	Yellow Pages Internet
monk.newmail.ru	Search engines of various profiles
www.top200.ru	Top 200 Websites
www.allru.net
www.ru	Catalog of Russian Internet resources
www.allru.net/z09.htm	Educational Resources
www.students.ru	Russian student server
www.cdo.ru/index_new.asp	Distance Learning Center
www.open.ac.uk	UK Open University
www.ntu.edu	US National University
www.translate.ru	Electronic text translator
www.pomorsu.ru/guide.library.html	List of links to network libraries
www.elibrary.ru	Scientific electronic library
www.citforum.ru	Digital library
www.infamed.com/psy	Psychological tests
www.pokoleniye.ru	Website of the Internet Education Federation
www.metod.narod.ru	Educational Resources
www.spb.osi.ru/ic/distant	Distance learning on the Internet
www.examen.ru	Exams and tests
www.kbsu.ru/~book/	Computer Science Textbook
Mega.km.ru	Encyclopedias and dictionaries

Professional search for information on the Internet

Searching for information is one of the most common and at the same time the most difficult tasks that any user has to face on the Internet. However, if for an ordinary member of the online community knowledge of methods of effective information retrieval is a desirable, but far from obligatory quality, then for information professionals the ability to quickly navigate Internet resources and find the required sources is one of the basic qualification skills.

The reason for the difficulties that arise when searching for information on the Internet is determined by two main factors. Firstly, the number of sources on the Internet is extremely large. At the end of 2001, the most rough estimates indicated an estimated figure of 7.5 billion documents located on servers around the world. Secondly, the array of information on the Internet is not only colossal in volume, but also extremely dynamic. In the half a minute that you spent reading the first lines of this section, about a hundred new or changed documents appeared in the virtual universe, dozens were moved to new addresses, and a few ceased to exist forever. The Internet never “sleeps”, just as our planet never “sleeps”, along which a wave of human business activity continuously rolls in exact accordance with the change of time zones.

Unlike a stable and controlled collection of documents in a library, on the Internet we are dealing with a gigantic and constantly changing information array, the search for data in which is a very, very complex process. The situation is often very reminiscent of the well-known problem of finding a needle in a haystack, and sometimes information of great value remains unclaimed solely because of the difficulty of finding it.

Most users of global computer networks have information research skills to one degree or another. Both amateurs and professionals often use the same tools. However, the results of the searches and the time spent on them vary greatly.

The purpose of this section is to familiarize yourself in detail with the tools and methods of information retrieval and develop stable skills for professional search on the Internet for all types of data: from texts in any format, to video and animation.

Automatic search for information on the Internet. Overview of programs for searching documents and data

Introduction

Chapter 1

Universal Internet search engines

Google

Yandex

Bing

Exalead

Popular articles

Latest articles

Sections

Pages

Special projects

Contacts