Well, sooooo big data…. Pros and cons of JSON-LD markup

Twitter

Simple data types: variables and constants.

Programming languages.

Lecture 4.

The real data that the program processes are numbers (integers and reals), symbols, and logical values. These data types are called basic. All data processed by a computer is stored in memory cells, each of which has its own address. In order not to keep track of which address the data will be written to, programming languages use the concept variable, which allows you to escape from the address of a memory cell and communicate with it using the name ( identifier).

Variable– there is a named object (memory cell) that can change its value. Name variable indicates meaning, and the method of its storage and address remain hidden from the programmer. In addition to its name and value, a variable has type, which determines what information is in memory.

The variable type specifies:

The method used for recording information into memory cells;

The required amount of memory to store it.

If variables are present in a program throughout the entire duration of its operation, they are called static. Variables created and destroyed on different stages program execution is called dynamic.

All other data in the program, the values of which do not change throughout its operation, are called constants or permanent. Constants, like variables, have a type.

To improve productivity and quality of work, it is necessary to have data that is as close as possible to real analogues. A data type that allows several variables to be stored together under one name is called structured. Every programming language has its own structured types. One of the structures that combines elements of the same data type is array.

Array is an ordered collection of quantities of the same type that have a common name, the elements of which are addressed (distinguished) by serial numbers (indices).

Array elements in computer memory are stored nearby, single elements simple type This arrangement of data in memory is not assumed. Arrays differ in the number of indices that define their elements.

A one-dimensional array assumes that each element has only one index. Examples of one-dimensional arrays are arithmetic and geometric sequences that define finite series of numbers. The number of elements in an array is called dimension. When defining a one-dimensional array, its dimension is written in parentheses next to its name. For example, an array consisting of elements a1, a2... a10 is written as A(10). The elements of a one-dimensional array are entered element-by-element, in the order necessary to solve a specific problem. The process of entering array elements can be depicted in the form of a flowchart as follows:

For example, consider the algorithm for calculating the arithmetic mean of positive elements of the numerical array A(10). The algorithm for solving the problem will contain the calculation of the sum (denoted by S), including the positive elements of the array (ai>0), and the number (denoted by N) of its terms.

Recording the algorithm in pseudocode:

1.Repeat 10 times (for i=1,10,1)

1.1.Input ai.

2.Initial value of the sum: S=0.

3.Initial counter value: N=0.

4.Repeat 10 times (for i=1,10,1):

4.1.If ai>0, then S=S+ai and N=N+1.

5.If N>0, then calculating the arithmetic mean SA=S/N; output SA. Otherwise: output “There are no positive elements in the array.”

Recording the algorithm in the form of a block diagram:

A two-dimensional array assumes that each element has two indices. In mathematics, a two-dimensional array (or table of numbers) is called a matrix. Each element has two indices aij, the first index i determines the row number in which the element is located (horizontal coordinate), and the second j – the column number (vertical coordinate). A two-dimensional array is characterized by two dimensions N and M, which determine the number of rows and columns, respectively.

The elements of a two-dimensional array are entered line by line, in turn, each line is entered element by element, thereby defining a cyclic construction that implements the nesting of cycles. Block diagram of the algorithm for entering a two-dimensional array:

The outer loop determines the number of the input line (i), the inner loop determines the number of the element in the column (j).

Examples

If you don't know how to add markup code to your site, use the Marker tool.

You can also learn how to work with structured data and add markup to your site manually.

How to mark up a web page or email

The markup can be placed on an HTML page or in an HTML email file.

How to mark up a web page

Execute the following actions:

How to mark up an email in HTML format

Follow these steps:

Save changes to continue editing the page or email

To save the markup as it exists in this moment, create a bookmark for the page in your browser. The Structured Data Tagging Wizard will “remember” the tagging, including all its values, for a month.

How to remove tags

To remove all or part of the markup, follow the steps below.

How to remove a single tag

Open the sample page or email and click on the tag you want.

From the menu that appears, select Remove Tag.

You can also find the item you want in the My Items column, hover your mouse over it, and click the X on the right.

How to remove all tags

Advanced date markup

The Structured Data Markup Wizard recognizes various formats. The main thing is that the month, day and year are indicated. You can add any missing data to the page group, such as the year.

If the dates on the page are displayed as a single fragment (for example, June 4, 2012), then it is recommended to mark them with one tag. The fewer tags on the site, the faster it will be processed and the more accurate the results will be.

How to add one date tag

Start marking up according to the instructions for the page or letter.

On the add tags page, use your mouse to highlight a date, for example June 2, 2012.

From the menu that opens, select Date > Date/Time or Range.

pages or letters.

Adding tags for date snippets

Sometimes the date information appears in bits and pieces, or labels are used to identify its components. For example, on a page with information about several events, the month and year may be listed only at the top, and the day next to each individual event. In this case, you need to add tags for each date fragment.

Please note that the Markup Wizard does not recognize dates that are divided into fragments and at the same time represent a range (for example, June 4–5 and 2012).

How to add tags for date snippets

Start marking up according to the instructions for the page or letter.

On the Add Tags page, select a date snippet with your mouse, such as "June".

In the menu that opens, select Date > Advanced > required fragment . Example: Date > Advanced > Month.

The Markup Wizard will add the date to the My Data Items column.

Continue adding tags for pieces of data until you have tagged them all.

Complete the layout according to the instructions for the page or letter.

Examples of tags for dates

Below are examples of dates you can mark.

Separate date. For example, you can check the following options:
- 2012, June 4
- June 4, 2012
- 04/13/2012 - Your tags can include other delimiters and a four-digit year value, for example 4/13/2012. For dates that can be read in different ways, Google interprets the first number as the month. For example, the date 6/4/12 is recognized as June 4, 2012, and 4/13/12 is recognized as April 13, 2012.
You can mark multiple dates on a page. For example, if you mark June 4, 2012 and June 6, 2012, this will mean that the event will take place twice: the first time on June 4, and the second on June 6.
Day range. For example, June 4-7, 2012
Please note that the separator between the start and end date must be a hyphen (-).

Dates with times. For example, you can note the following dates:
- June 4, 2012 3 pm – date and time (am or pm). If you don't specify morning or evening time, Google interprets the data using standard business hours. For example, 11 would be considered 11 am, and 2 would be considered 2 pm.
- June 4, 2012, 15:00 - 24-hour time format.
- June 4, 2012 3 pm EST or June 4, 2012 3 pm -5:00 – time indicating the time zone or deviation according to UTC/GMT.
- June 4, 2012, 2-3 pm or June 4-5, 2012, 2-3 pm – time ranges with or without date range.
Date snippets: You can use advanced tagging settings to mark the following snippets of text as a separate date:
- Day: June 4, Wednesday. Year: 2013.
- June 4 | Time: 7:30pm-9:30pm and 2012
Google doesn't recognize date ranges that are spread across multiple tags. For example, the following date tags are invalid:
- June 4-5 and 2012

How to manually specify date format

The Markup Wizard recognizes dates on a page according to the formatting rules specified for the page's language. For example, if the page uses American English (en-US), the date 12-06-12 would be interpreted as December 6, 2012. But if the page uses British English (en-GB), then the same date would be interpreted as June 12, 2012 d. The Markup Wizard automatically detects the page language and uses the appropriate rules.

To set a different date format for the Markup Wizard, follow these steps:

In the window that opens, select the date format from the corresponding list.

Click Save.

How to add missing data

If the page or email is missing certain information, such as the year the event is scheduled for, you can choose the value yourself. The Structured Data Markup Wizard will add HTML markup for it.

You can add missing data, as well as change or delete them at any time.

How to add, change or delete data

Click Add missing tags at the bottom of the My Items column.

Do any of the following:

Select a tag from the list and enter a value. For example, you can select the Category tag and enter the value "Russian folk songs".
Delete existing data by clicking X in the text field.
Change the value in the field.

Click Save.
Changes will appear in the My Items column

How to change page language

The Structured Data Markup Wizard automatically detects the language of a sample page or email message to better recognize the data. If the tool makes an error, you can set the correct language yourself.

To do this, follow these steps:

Click the settings icon and select .

In the window that opens, specify your language.

Click Save.

What is schema.org

schema.org is the result collaboration Google, Microsoft and Yahoo! to improve the Internet by creating a common standard for describing web data. If you add schema.org markup to your HTML pages, many companies and systems, including Google Search, will be able to recognize the information on your site. Likewise, if you add schema.org markup to email in HTML format, its data can be recognized not only by Gmail, but also by other email services.

It is a translationarticles Nate Harris for the Ahrefs blog. You can learn more about the author from his pageFacebook .

The information is useful for webmasters and advanced SEO specialists. Although beginners can appreciate the importance of structured data in modern SEO.

You will learn about

features of using Schema.org,
subtleties of JSON-LD,

interesting tricks in Google Search Console(the old fashioned way - Google Webmaster),

myths of structured markup.

Search engines have made it clear that good snippets will be extremely important in the search of the future.

We know Google is adding something every couple of months new block in Google Search Gallery.

Google has a great understanding of the content on a website. When it comes to the nuances in articles and the specifics of each page, the search engine robot already needs help. This is why structured data will help you rank well.

Structured Data is a general term that refers to any organized data of a specific format.

This is not an SEO term. Relational databases data science - the fundamental core of all computing - relies on structured data. SQL, a structured query language, manages organized data.

When a site creator wants to present a page as a user profile, an event page, or a job listing, certain markings need to be placed in the code.

How more pages on the site, which the search robot will perceive as XML or JSON objects, the better your content will rank in search results.

The de facto main language for describing structured data on the Internet is schema.org. For example, to represent an airline flight, schema.org contains rules to describe the type of flight, gate number, and menu.

The project was founded in collaboration between Google, Microsoft, Yahoo and Yandex. It stays open source code and is technically edited by anyone. However, like any W3C project, the process of making changes is not that simple. If you would like to add new type structured data, put up with technical and bureaucratic delays. The end result is a new markup type included in the Schema.org library.

4 data structuring options

Micromarking JSON-LD is one of the new structured data formats, and it is one that Google regularly recommends. Instead of putting tags for each html element,JSON-LD is a large block of information code that tells Google robot: “Aircraft type, departure time, menu, etc.”

JSON-LD is also good because there is no need to place any visual elements content containing information.

RDFa+GoodRelation is another syntactic HTML extension. RDFa is different from JSON in essence. Instead of putting structured data in one block, the HTML extension is scattered throughout the document and structures your data on the fly.

This syntax can be thought of as another attribute. For example, like class. This format can be useful for marking dynamic elements (reviews). In such cases, it is faster and more convenient than using JSON.

Microdata is a language extension in HTML5 format. Rarely used.

Microformat aka μF is a microformat most commonly found in hAtom/hentry form.

Google Search Console Data Labeling

For sites that do not large number elements for labeling, Google offers useful tool in GSC, allowing site owners to quickly apply structured data. However, here are a few reasons not to use Data Highlighter

Your data label markup will be broken if anything changes to the formatting of your pages.
Labeling is only available to Google's crawler.

How structured data helps SEO

Extended snippets (so-called Rich Snippets) are the most desirable for all webmasters, as they increase CTR. For example, displaying product ratings directly in the page snippet for an online store.

Knowledge graph is a block about a brand or personality, for example:

AMP, Google News, etc. - in order to get into Google News or be marked as AMP, the site must have many different types of micro markup, for example events.

Content indexing and ranking. Search engines claim that they better understand the context and meaning of page content if you use markup, even if there are no clearly visible results.
Other search engines. Each search engine processes structured data differently. Yandex has fields necessary for successful processing that Google does not require. Baidu's first page results rely heavily on structured data.

Myths of Ranking Factors Microdata is not a ranking factor.

In the past, we've seen some kind of trick on Google's part that takes into account microdata. Google understood branded queries in search results. For example, if you are the owner of Tim's Pizzeria in Brooklyn and someone types in the query "tims pizzeria brooklyn", then your site will appear in first place in the search results, even without a link profile.

If Google has not yet realized that your site is an analogue of “Tim`s Pizzeria”, then micro markup can help with this, as well as with the knowledge graph that was described above.

Micro markup is not magic and does not add quality to the site in the eyes of search engines. This must be remembered without forgetting its advantages.

Examples of using structured data

Using JSON-LD is in the simplest way implementation of structured data on the site. This markup tells you that your site "is a collection of related web pages and other elements that are typically hosted on a single domain and accessible at specific URLs."

Paste this code to your site in the same way as, for example, the GA code, replacing yoursite.com with your URL.

( "@type": "WebSite", "url": "https://ahrefs.com/" )

Launch the Google tool and click on “Run Test”.

You should see something like this:

Here's an example for an ahrefs blog where you can enable next block JSON-LD.

( "@context": "https://schema.org", "@type": "BlogPosting", "url": "https://ahrefs.com/blog/bla-bla-bla", "headline" : "What is Structured Data? And Why Should You Implement It?", "alternativeHeadline": "Structured Data 101", "description": "Structured data is bla bla bla bla", "datePublished": "July 4, 2017" , "datemodified": "July 5, 2017", "mainEntityOfPage": ( "@type": "WebPage", "url": "https://ahrefs.com/blog/bla-bla-bla" ), " image": ( "@type": "imageObject", "url": "http://example.com/images/image.png", "height": "600", "width": "800" ), "publisher": ( "@type": "Organization", "name": "ahrefs", "logo": ( "@type": "imageObject", "url": "http://example.com/images /logo.png" ) ), "author": ( "@type": "Person", "name": "Nate Harris"), "editor": ( "@type": "Person", "name": "Tim Soulo"), "award": "The Best ahrefs Guest Post Ever Award, 2017", "genre": "Technical SEO", "accessMode": ["textual", "visual"], "accessModeSufficient": [ "textual", "visual"], "discussionUrl": "https://ahrefs.com/blog/bla-bla-bla/#disqus_thread", "inLanguage": "English", "articleBody": "Search engines have made it clear: a vitally important part of the future of search is rich results. While controversial..." )

Many people will need to implement micro markup for an online store. Below is sample code for eCommerce sites.

( "@context": "http://schema.org", "@type": "Product", "url": "https://timspizzeria.com/goat-cheese-pizza", "aggregateRating": ( "@type": "AggregateRating", "ratingValue": "3.5", "reviewCount": "2", "bestRating": "5", "worstRating": "1" ), "description": "Tim"s pizzeria"s most delicious cheesiest cheese pizza. Made with 100% goat cheese turned blue.", "name": "Tim"s Goat Cheese Pizza", "image":["https://timspizzeria.com/goat-cheese -pizza-hero.jpg","https://timspizzeria.com/goat-cheese-pizza-olives.jpg","https://timspizzeria.com/goat-cheese-pizza-pineapple.jpg"], " offers": ( "@type": "Offer", "availability": "http://schema.org/InStock", "image":"https://timspizzeria.com/goat-cheese-pizza-hero. jpg", "price": "26.00", "priceCurrency": "USD", "sku":"1959014", "seller":( "@type":"Organization", "name":"Tim"s Pizzeria "), "availability": "http://schema.org/InStock"), "review": [ ( "@type": "Review", "author": "Nate", "datePublished": "2017- 07-041", "reviewBody": "Dope lit funkytown! Delicious pizza.", "name": "n8 h", "reviewRating": ( "@type": "Rating", "bestRating": "5", "ratingValue": "5", "worstRating": "1 " ) ), ( "@type": "Review", "author": "Dmitry", "datePublished": "2016-05-22", "reviewBody": "This is the grossest thing I"ve witnessed, let alone tasted.", "name": "OMG this pizza is abhorrent", "reviewRating": ( "@type": "Rating", "bestRating": "5", "ratingValue": "1", "worstRating" : "1" ) ) ] ) )

It's worth noting that Google understands JSON-LD even if its elements are rendered asynchronously, so markup can be easily implemented via Google Tag Manager, AJAX, etc.

Structured Data Tools

For WordPress website owners, we can recommend the Schema plugin for quick and easy easy setup micro markup. Most markup plugins for WordPress have many problems and shortcomings. Many of these plugins pass extra WP site design theme data as markup elements, such as: author, date Published, Featured Image, etc.

However, using plugins will not allow you to cover all the Schema features that Google supports. Fine and high-quality micro-markup tuning is the path to success in Google results. Let's take a look at Sephora's unusual product card setup. Also interesting micro markup is used on .

And here is an example of the experimental event page markup that the author of the article implemented for one of his clients.

This micro-markup makes the author’s client’s site one of the few (for example, suggestedMinAge is used by only 100 to 1000 domains)

Another problem with SEO plugins for microdata is that their use often leads to duplicates. This can be a problem, for example, for product cards: Google may consider two markup elements of the same product to be two different products.

The author of the article is currently working with this problem on one of the sites: Shopify has an embedded micro-markup for Schema products, which duplicates the micro-markup implemented by the author for extended snippets, containing the aggregate ratings and review sections.

Someone can suggest https://www.schemaapp.com/ ... The author of the article has not encountered it and will not recommend pros or cons. However, here's something worth noting:

Schema App is a set of tools that allows Internet marketers to create and manage Schema markup even without deep knowledge of the Scherma.org language and programming.

It all seems too complicated

For instant results basic capabilities micro markup will definitely help SEO. Basic structured data can be implemented using plugins. If you choose to use plugins, then you should be prepared for the difficulties described above.

All those working on large projects should pay more attention to advanced markup. Think about the fact that good understanding structured data is your “golden ticket” to experimenting with search results. This ensures that your site is “understood” by the search engine.

And the good news is that working with microdata does not need to be done regularly. It is enough to work through this issue competently once, and you don’t have to return to it again.

Since the implementation of microdata is associated with programming, it is a kind of “horror story” and is very often ignored by SEO specialists. The author is sure that some technical SEO specialists may not like this and believes that Schema is not fully used by all optimizers.

conclusions

Technical SEO is endlessly varied and broad in scope, and understanding structured data is fundamental. In fact, the Semantic Web could be the death of SEO specialists: The more data we feed Google, the more extensions are created that divert traffic from organic results.

When we correctly implement structured data on a website, we teach search engines to do better without us in the future. Data tagging, on the one hand, being useful, is a successful self-learning tool for Google.

However, the benefits of structured data are so great that microdata cannot be ignored. Not to mention the potential for increased traffic, well-executed data tagging improves a site's chances of being considered for Google's ever-evolving additions and organic search extensions.

Page 1

Structured data in Ada can be used in the form of arrays and records. In addition, structured data in Ada can be accessed using pointers. The use of arrays with unspecified boundaries makes it possible to parameterize arrays and use subroutines that use variable-sized arrays as parameters.

Semantics of variables in the PILOT / 2 language.

Processing complexly structured data in external memory is a distinctive property of all LPZs. But in addition to this, ordinary variables are also needed. That's why registers and stacks are introduced in PILOT/2.

Sets of procedures that represent structured data have an interesting and sometimes useful property: they can be used to construct other possible representations. So, for example, list view 2 logically follows from list view 1, and the first of them could be equipped with instructions using appropriate control directives that would allow the second view to be output. In this context, list view 2 would behave like a normal set of procedures producing output. This ability of logical statements to simultaneously perform functions like normal procedures, and representations of data structures shows that any supposed distinction between procedures and data is essentially pragmatic, and concerns only the use of these resources, and not the attributes inherent in them.

The components of an array represent structured data of the same type. An array combines data with the same properties. In contrast to arrays, the components of a direct (Cartesian) product can have Various types. The direct (Cartesian) product, like an array, is one of the basic structured data types, and is also called a record or structure.

Knowledge is well-structured data, or data about data, or metadata.

In term representation, structured data is formed using functional symbols that allow its constituent parts to be assembled into groups. So, for example, the list (10 20 30) could be represented by the term 10.20.30. NIL, in which each dot functor groups the element to its left with the tail of the list to its right. Both constants and structured terms can be thought of as essentially passive objects intended to be manipulated by procedures.

The ALTOP technology, created on the basis of work on software ACS. This development includes original tools for compiling initial descriptions, discussed in sections 2.4 and 2.5, and a design methodology (see Chap.

Thus, the structured data class represents data for which storage requires the creation of fixed sets of formats. Databases storing such data are formatted with a deterministic schema, oriented towards the preliminary fixation and classification of objects in the external environment, the precise statement of properties and relationships described in the database from a pre-created set of fixed formats.

A database is a collection of structured data.

To place structured data in linear memory structures, they are used various techniques and methods. As a rule, such data is presented in the form of lists, and the efficiency of search and other characteristics of data processing systems directly depend on their organization.

Names in programs are used to denote simple variables, structured data, structured data elements, constants, subroutines, operations, statement labels, formal parameters, and other program elements. Names can be uncompounded or compound.

The language language is based on non-operator means of describing hierarchically structured data. It uniquely determines the trajectory of movement and access to the database. In addition, languages have tools similar to procedural programming languages.

A formalized questionnaire designed for processing and recording structured data.

In this terminology, a database can be defined as a collection in a special way structured data and relationships between its elements, segments and logical records. The construction of databases in this understanding is possible only for information objects that have properties common to the whole class. If necessary, provide information base objects with individual properties, it is advisable to build unstructured data bases that allow recording information in natural language.

Every enterprise has many different databases that are replenished from structured data sources. Structured data is data that is entered into databases in a specific form, for example, Excel tables, with strictly defined fields. A set of enterprise databases is called in English literature Enterprise Data Warehouse (EDW) - literally “data warehouse”. I have not yet come across an analogue of this term in Russian-language literature, so let’s call it “enterprise data warehouse.” For beauty, we will use the English abbreviation EDW.

Structured data sources are applications that capture data from various transactions. For example, this could be CDRs in the operator’s network, notifications of network troubles (trouble tickets), financial transactions on bank accounts, ER (Enterprise Resource Planning) system data, application program data, etc.

Business intelligence BI (Business Intelligence) is a data processing component. This various applications, tools and utilities that allow you to analyze the data collected in the EDW and make decisions based on it. These are operational report generation systems, custom queries, OLAP applications ( On-Line Analytical Processing), so-called “disruptive analytics”, predictive analysis and data visualization systems. Simply put, a manager must see the business process in an easy-to-read form, preferably graphical and animated, in order to quickly make optimal decisions. The first law of business: the right decision is a decision made on time. If the correct decision for yesterday is made today, it is not a fact that it is still correct.

But what if the data sources are unstructured, heterogeneous, obtained from different sources? How will analytical systems work with them? Try using your mouse to select several cells with data in Excel spreadsheet and paste it into a simple text editor (for example, Notepad) and you will see what “unstructured data” is. Examples of unstructured data: Email, information from social networks, XML data, video, audio and image files, GPS data, satellite images, sensor data, web logs, movement data mobile subscriber in handover, RFID tags, PDF documents...

To store such information in data processing centers (DPCs), distributed file system Hadoop, HDFS (Hadoop Distributed File System). HDFS can store all types of data: structured, unstructured and semi-structured.

Applications Big Data for business analytics - a component not only of processing, but also with data, both structured and not. They include applications, tools and utilities that help analyze large volumes of data and make decisions based on data from Hadoop and other non-relational storage systems. It does not include traditional BI analytics applications, nor extension tools for Hadoop itself.

In addition, an important component of Hadoop is the MapReduce system. It is designed to manage resources and data processing in Hadoop to ensure storage reliability and optimized data placement in geographically distributed data centers. The MapReduce system consists of two main components - Map, which distributes duplicate blocks of unstructured data across various nodes of the storage system (for the purpose of reliable storage of information), and Reduce - a component for removing identical data, both in order to reduce the required total storage volume and increase correctness subsequent actions on the data. MapReduce is notable for the fact that it processes data where it is stored (i.e. in HDFS), instead of moving it somewhere for processing, and then writing the results somewhere else, which is usually done in conventional EDW . MapReduce also has a built-in data recovery system, i.e. if one storage node fails, MapReduce always knows where to go for a copy of the lost data.

Although the data processing speed of MapReduce is an order of magnitude superior traditional methods processing with “extraction” of data, however, due to the incomparably large volumes of data (that’s why they are Big Data), MapReduce usually uses parallel processing of data streams (batch mode). With Hadoop 2.0, resource management is a separate functionality (called YARN), so MapReduce is no longer a bottleneck» in Big Data.

The transition to Big Data systems does not mean that traditional EDW should be scrapped. Instead, they can be used together to take advantage of both and extract new business value from their synergies.

What is all this for?

There is a widespread opinion among consumers of IT and telecom equipment that all these spectacular foreign word and letter combinations - Cloud Computing, Big Data and various other IMS with softswitches are invented by cunning equipment suppliers in order to maintain their margins. That is, to sell, sell and sell new developments. Otherwise, the sales plan will not be fulfilled and Bill Jobs Chambers will say “ah-ah-ah.” And “the bonus for the quarter was covered.”

Therefore, let's talk about the need for all this and trends.

Probably, many have not yet forgotten the terrible H1N1 influenza virus. There were fears that it could be even stronger than the Spanish flu of 1918, when the number of victims was in the tens of millions. Although doctors were supposed to regularly report on increasing cases of diseases (and they did report them), the analysis of this information was delayed by 1-2 weeks. And the people themselves applied, as a rule, 3-5 days after the onset of the disease. That is, measures were taken, by and large, retroactively.

The dependence of the value of information on time usually takes the form of a U-shaped curve.

Information is most valuable either immediately after it is received (for making operational decisions) or after some time (for trend analysis).

Google, which stores many years of search history, decided to analyze the 50 million most popular queries from the sites of previous influenza epidemics, and compare them with medical statistics during these epidemics. A system was developed to establish a correlation between the frequency of certain queries and 40-50 typical queries were found. The correlation coefficient reached 97%.

In 2009, it was possible to avoid the serious consequences of the H1N1 epidemic, precisely because the data was obtained immediately, and not after 1-2 weeks, when the clinics in the epidemic areas would no longer be crowded. This was perhaps the very first use of big data technology, although it was not called that at the time.

It is well known that the price of an air ticket is very unpredictable and depends on many factors. I recently found myself in a situation where I could buy the same economy class ticket from the same airline to the same city in two possible options. For a flight leaving in the evening in three hours, a ticket cost 12 thousand rubles, and for the early morning tomorrow– 1500 rubles. I repeat, there is one airline and even the aircraft on both flights are of the same type. Usually the price of a ticket is higher, the more time is getting closer departure. There are many other factors that influence the price of a ticket - once a booking agent explained to me the essence of this host of fares, but I still didn’t understand anything. There may be cases when the price of a ticket, on the contrary, falls if, as the departure date approaches, there are many unsold seats, in the event of any promotions, etc.

One day, Oren Encioni, director of the artificial intelligence program at Washington State University, was about to fly to his brother's wedding. Since weddings are usually planned in advance, he bought the ticket immediately, long before departure. The ticket was indeed inexpensive, much cheaper than usual when he bought a ticket for an urgent business trip. During the flight, he boasted to his neighbor how cheaply he managed to buy a ticket. It turned out that the neighbor’s ticket was even cheaper, and he bought it later. Out of frustration, Mr. Encioni conducted an impromptu sociological survey right in the cabin about ticket prices and dates of purchase. Most passengers paid less than Encioni, and almost all bought tickets later than Encioni. It was very strange. And Enzioni, as a professional, decided to tackle this problem.

Having acquired a sample of 12 thousand transactions on the website of one of the travel agencies, he created a model for predicting prices for air tickets. The system analyzed only prices and dates, without taking into account any factors. Only “what” and “how much”, without analyzing “why”. The output was a predictive probability of a decrease or increase in the price of a flight, based on the history of price changes for other flights. As a result, the scientist founded a small consulting firm called Farecast (play on words: Fare - fare, price; Forecast - forecast) to forecast prices for air tickets, based on a large database of flight bookings, which, of course, did not give 100% accuracy (which indicated in user agreement), but with a reasonable degree of probability she could answer the question of whether to buy a ticket right now or wait. To further protect against lawsuits, the system also provided a “self-trust score” something like this: “There is an 83.65% chance that the ticket price will be lower in three days.”

Then Farecast was bought by Microsoft for several billion dollars and integrated its model into its Bing search engine. (And, as is most often the case with Microsoft, nothing more is heard about this functionality, since few people use this Bing, and those who use it know nothing about this function).

These two examples show how social benefits and economic benefits can be achieved through Big Data analytics.

What exactly is Big Data?

There is no strict definition for “big data”. As technologies emerge to work with large volumes data for which the memory of one computer was no longer enough and had to be stored somewhere (MapReduce, Apache Hadoop), it became possible to operate with much larger volumes of data than before. In this case, the data could be unstructured.

This makes it possible to abandon the restrictions of the so-called. “representative samples” from which larger conclusions can be drawn. The analysis of causality is replaced by the analysis of simple correlations: it is not “why” that is analyzed, but “what” and “how much”. This fundamentally changes established approaches to how to make decisions and analyze a situation.

Tens of billions of transactions occur in stock markets every day, of which about two-thirds of trades are decided using computer algorithms based on mathematical models using huge amounts of data.

Back in 2000, the amount of digitized information accounted for only 25% of the total amount of information in the world. Currently, the amount of information stored in the world is on the order of zettabytes, of which non-digital information accounts for less than 2%.

According to historians, from 1453 to 1503 (over 50 years) about 8 million books were printed. This is more than all the handwritten books written by scribes since the Nativity of Christ. In other words, it took 50 years to approximately double the information stock. Today this happens every three days.

To understand the value of “big data” and how it works, let’s give a simple example. Before the invention of photography, it took from several hours to several days or even weeks to draw a portrait of a person. In this case, the artist made a certain number of strokes or strokes, the number of which (to achieve a “portrait likeness”) can be measured in hundreds and thousands. At the same time, it was important HOW to draw, how to apply paint, how to shade, etc. With the invention of photography, the number of “grains” in analog photography, or the number of “pixels” in digital photography, changed by several orders of magnitude, and HOW to arrange them does not matter to us - it’s up to us the camera does this.

However, the result is essentially the same – an image of a person. But there are also differences. In a handwritten portrait, the accuracy of the likeness is very relative and depends on the “vision” of the artist; distortion of proportions, addition of shades and details, which are in the “original”, i.e., are inevitable. V human face, did not have. The photograph accurately and scrupulously conveys the “WHAT”, leaving the “HOW” in the background.

With some allegory, we can say that photography is Big Data for a handwritten portrait.

And now we will record every human movement at strictly defined and fairly small time intervals. It will turn out to be a movie. Film is “big data” in relation to photography. We increased the amount of data and processed it accordingly, resulting in a new quality – a moving image. By changing the quantity, adding a processing algorithm, we get a new quality.

Now video images themselves serve as food for Big Data computer systems.

As the scale of processed data increases, new opportunities appear that are not available when processing smaller data volumes. Google predicts flu epidemics no worse, and much faster, than official medical statistics. To do this you need to make thorough analysis hundreds of billions of input data, resulting in it providing answers much faster than official sources.

Well, briefly about two more aspects of big data.

Accuracy .

Big Data systems can analyze huge amounts of data, and in some cases - all data, and NOT samples. Using all the data, we get a more accurate result and can see nuances that are not available with limited sampling. However, one has to be content general idea, and not by understanding the phenomenon down to the smallest detail. However, inaccuracies at the micro level allow large quantities data allow you to make discoveries at the macro level.

Causality.

We are used to looking for reasons in everything. This is, in fact, what scientific analysis is based on. In the world of big data, causality is not so important. More important are the correlations between the data, which can give necessary knowledge. Correlations cannot answer the question “why”, but they do a good job of predicting “what” will happen if certain correlations are discovered. And most often this is exactly what is required.

***

Well, sooooo big data…. Pros and cons of JSON-LD markup

Popular articles

Latest articles

Sections

Pages

Special projects

Contacts