Data visualization. Data visualization in the work of an analyst: types of charts and which one to choose

All tips

4.1. Audio extension of visual analytics

One of the modern effective methods analysis of various scientific data is a method of computer visualization of these - a method of scientific visualization.

Solving the problems of analyzing scientific data using the method of scientific visualization consists of:

In obtaining a representation of the analyzed data in the form of some graphic image (the task of visualizing the source data). To obtain such a graphic image, it is necessary to associate the data with a spatial scene (a set of spatial objects), or more precisely, its geometric and optical (texture) descriptions (models), and then obtain graphic image this spatial scene (rendering), which can be displayed on a monitor, printer and other output devices for subsequent analysis.
In the visual analysis of the resulting graphical image of the analyzed data. In this case, the results of the analysis are interpreted in relation to the original data. This task is solved directly by the user himself. By visual analysis of a graphic image we mean a visual analysis of the mentioned spatial scene. A person can make judgments about relative position spatial objects, their shape and texture coverage.

Rice. 4.1. Sound enhanced scientific visualization method.

Let's consider the possibility of connecting another sense organ - the organ of hearing - in the process of visual analysis. In order to analyze source data using sound, it is accordingly necessary:

Receive representations of the analyzed data in the form of some kind of audio display. To obtain such a display, it is necessary to supplement the geometric and optical description of the spatial scene with its audio description, and put such an audio spatial scene in correspondence with the original data, and then obtain an audio representation of this spatial scene (sound rendering), which can be output to speakers and other output devices for subsequent analysis.
Analyze the sound display and interpret the results of the analysis in relation to the original data.
It is important to note that having both an audio display and a graphical image matched to the original data for subsequent analysis often makes the analysis process more efficient.

Figure 4.1 shows a block diagram of the advanced visualization algorithm, i.e. jointly obtaining a graphic image and sound display of the source data.

Rice. 4.2. Structural scheme algorithm for advanced scientific visualization.

Such an algorithm is implemented using a computer by writing a visualization application program in some input programming language used tool(a set of funds) in the form of some software product or a set of software products.

Let us consider the main characteristics of the method of extended scientific visualization in relation to a certain class of scientific data analysis problems, when descriptions of certain scalar fields are used as such data.

4.2. Audio enhanced visual analytics of scalar fields.

Initial data

This data is a description of a scalar field on some rectangular compartment of three-dimensional physical space, presented in a file. Used as description mathematical model scalar field in the form of function values of 3 variables of the form F(x,y,z) at the nodes of a given three-dimensional rectangular mesh.

Filtered data

The interpolation function f(x,y,z) is used as filtered data, using the values of the original function at the grid nodes. To obtain the function f(x,y,z) specified on the considered domain of definition (parallelepiped), we used trilinear interpolation.

Description of the spatial scene

This description includes 3 components:

Description of the isosurface of the function f(x,y,z) in the form: f(x,y,z)=с, while jointly accepted x,y,z values belong to the above domain of definition of the function f(x,y,z), and c is a user-specified real number corresponding to a certain value of the analyzed scalar field.
This isosurface is the first component of the spatial scene introduced into consideration.
Optical description, i.e. description of the texture coating of the mentioned isosurface, specified by the user.
Sound description, i.e. description of the point source of sound introduced into consideration. Its location (coordinate values x0,y0,z0) is specified by the user, and the frequency outgoing sound w is related to the value of the function f(x,y,z) as follows: w = k*f(x0,y0,z0), where the value of the coefficient k is specified by the user.
This point source of sound is the 2nd component of the spatial scene introduced into consideration.

Graphic image and sound display of a spatial scene

The results of rendering are respectively:

projection graphic image of the isosurface on the graphics terminal used,
the sound wave generated by the audio terminal being used.

It should be clarified that the frequency sound wave perceived by the user as a tone of sound.

Advanced data visualization tools have become an integral part of corporate BI platforms. Some vendors, such as Tableau, have made visualization the cornerstone of their analytical systems. The trend permeates the analytics systems market completely: today, even the simplest analytical tools for end users often rely on visual representation of data.

Big Data Visualization

A visual presentation of the analysis results is of fundamental importance for their interpretation. It is no secret that human perception is limited, and scientists continue to conduct research into improving modern methods of presenting data in the form of images, diagrams or animations. It would seem that it is impossible to come up with anything new here, but in fact this is not the case. As an illustration, here are several advanced visualization methods that have become widespread relatively recently.

Tag Cloud

Each element in the tag cloud is assigned a specific weight that correlates with the font size. In the case of text analysis, the value of the weighting coefficient directly depends on the frequency of use (quotation) of a certain word or phrase. Allows the reader to quickly get an idea of the key points as much as necessary large text or typing.

Clustergram

Visualization method used in cluster analysis. Shows how individual elements data sets are assigned to clusters as their number changes. Selecting the optimal number of clusters is an important component of cluster analysis.

Historical stream

Helps you follow the evolution of a document you are working on simultaneously a large number of authors. In particular, this typical situation for wiki services and the tadviser website as well. Time is plotted along the horizontal axis, and the contribution of each co-author is plotted along the vertical axis, i.e. volume of entered text. Each unique author is assigned a specific color on the chart. The chart below is the result of an analysis for the word “Islam” on Wikipedia. It is clearly visible how the activity of the authors has increased over time.

Spatial flow

This diagram allows you to track the spatial distribution of information. The example chart was created using the New York Talk Exchange service. It visualizes the intensity of IP traffic exchange between New York and other cities around the world. How brighter line– the more data is transmitted per unit of time. It is so easy and not difficult to identify the regions closest to New York in the context of information exchange.

Processing operational databases and multidimensional storage

Transactional heavy-load databases
Relational and multidimensional storage
Direct calls to system APIs
Working with connectors
Processing flat files (CSV, Excel, XML, etc.)

Building analytical models in memory

Computers were created to automate calculations, but it was soon discovered that the matter was not limited to counting and they could be used as universal device for working with data. Then there were attempts to make a computer think, but less well known are attempts to turn it into an intellectual assistant that does not claim its own intelligence, but facilitates human mental activity. Vannevar Bush was the first to put this idea into the form of a hypothetical Memex machine, and the ideologically similar plans of Doug Engelbart were embodied in the On Line System (NLS). In the early 1960s, mathematician and philosopher Richard Hamming wrote: “The purpose of a computer is cognition, not calculation.” However, these great good intentions remained unrealized, since the practical possibility of creating systems that seamlessly combine the analytical abilities of man and machine has only appeared today.

Together, the creative mind of a person and the ability of a machine to crunch huge amounts of data give a synergistic effect, which can be assessed by analyzing the consequences of Garry Kasparov’s loss in 1997 IBM supercomputer Deep Blue PowerParallel SP2 with 32 nodes, including special boards with 8 chess special processors. After this match, the grandmaster decided that he needed to even the odds and to do this, he should allow players to expand their analytical capabilities through chess programs running on a PC. Kasparov did not meet with Deep Blue again, but before Deep Blue was dismantled, two amateurs, “armed” with ordinary PCs, played one more match with him. Not being grandmasters and not having serious chess qualifications, these two turned out to be more successful than the world champion in confronting the monster. This serves as an excellent illustration of the fact that systems where functions are harmoniously divided between man and machine can be completely more efficient. automated system. We can observe the first signs of this in business analytics, it is here that the dividing line between the old and the new in business analytics is based on the way a person interacts with a machine, and not on the division into types of analyzed data (structured or unstructured).

The new opportunity for productive human-machine interaction has led to an active rise in the field of business analytics and the creation of new technologies. analytical platforms. Any period of recovery is usually accompanied by noticeable changes within the corresponding market segment. IN in this case the focus shifts from established and well-known technologies supported major manufacturers, to a subset almost unknown among the masses Data technologies Discovery (DD) - otherwise called Visual Analytics (VA) or Visual Data Discovery (VDD), - developed mainly by small companies and characterized by advanced human-machine interaction capabilities. There are other indicators of change; in parallel with DD and VA, two more groups are actively developing analytical technologies: content and predictive, or predictive analytics(Predictive Analytics, PA).

The scope of visual analytics is extremely large (Fig. 1), but here we will focus only on segments related to business analytics.

Throughout the short history of business analytics, the most important place Among the methods used were reporting systems (Query and Reporting), the essence of which is to formulate queries addressed to certain data sources, and then compile reports on their results. All Query and Reporting solutions are divided into two large streams: one generates Production Reporting - periodically issued reports containing information about production, and the second produces Business Query and Reporting - operational reports designed for the requirements of enterprise management. In the early nineties, in addition to them, online analytical processing systems (OnLine Analytical Processing, OLAP) appeared. The price for quick analysis is a lengthy preparatory procedure - this category of technologies is used to analyze only those data that are previously aggregated into structured multidimensional OLAP cubes. Each such cube contains all the information that may be needed to answer the intended queries. The need to prepare the cube eliminates any synchronization with the input data stream. However, on OLAP based and related technologies began to develop even more operational means analysis - for example, “Management Dashboards”, equipped with a variety of dashboards reminiscent of a captain’s bridge. The concept of the “Real Time Enterprise” (RTE) has also taken hold.

In parallel with all these tools, theoretical directions developed in business analytics, such as the development, or extraction, of data ( Data Mining, DM) and predictive analytics. More recently, it seemed that DM and PA contained the main resource for the development of BI, but in practice the interactive methods of DD, VA and VDD turned out to be more in demand. They are now rated as the most promising, although the forces that create them are orders of magnitude inferior in power to traditional players in business intelligence solutions.

Prerequisites for changes in analytics

There are so many explanations and reasons for what happens in analytics that the words of Dan Brown from the novel “Angels and Demons” come to mind: “Some pray to Jesus, others go to Mecca, and still others study the behavior of elementary particles. Ultimately, everyone is in search of the truth." Therefore, by analogy, we will divide the views on the evolution of business analytics into three groups.

The most popular and widely circulated is the “managerial” point of view, associated with reflections on such fashionable topics as Big Data, “the next industrial revolution,” Industry 4.0, etc. Most often they talk about the “three Vs” of Big Data analysts: Volume "); Variety (“variety”); Velocity (“mobility”).

The second group of views on change is formed by adherents of the knowledge management theory (Knowledge Management, KM). They connect what is happening with the “information revolution”, dividing it into five stages of ascent from data to information, from information to knowledge, from knowledge to understanding and intelligence: operational - fragmentary data is analyzed by individual analysts; consolidated - large volumes of data are analyzed by working groups of analysts; integration - an enterprise infrastructure is created that supports CM; optimization - analyzed to improve efficiency corporate information generally; innovative - combining corporate intelligence and analytics creates new value.

According to the third point of view, the result of the evolutionary process is the transition from general ideas about the enterprise to real knowledge.

It is worth noting that the first two points of view are closer to the cult, and the third is an alternative, it, as in the quote, is close to the natural scientific position, in this case it is a systemic view, which is useful when considering a variety of phenomena, in particular those occurring in analytics. It is important, however, to pay attention to the following - all control systems have commonality, regardless of their nature. Nobert Wiener wrote about this, who was the first to express the idea of the unity of management principles in living organisms, machines and organizations. Wiener’s work did not receive a worthy theoretical continuation, and in real practice management technical objects went its own way, and business went its own way. For many years there was no commonality between the two classes of management systems - technical and corporate; moreover, a barrier arose dividing them into two non-overlapping areas. And this is not someone’s ill will - the presence of a division for a certain time period is completely justified and natural, these are the requirements of life, the division is caused by the strict limit that existed in the past on computer resources and the desire to use the available means optimally, satisfying the requirements of the task to the greatest extent. Since there are two types of tasks - technical and corporate governance, then, accordingly, two types of solutions arose.

Subsequently, it turned out that the idea of automating management in business gave way to banal data processing and was buried under the load of various kinds of routine applications such as ERP, CRM, etc., in which everything is built on accounting and control, that is, simply on common sense. Assessing results and status, making decisions and everything else that forms feedback turned out to be outside the scope of automation. But in real life no control systems can exist without feedback, and at the current level of automation, feedback is implemented in the minds of managers who receive external data and make decisions.

In the second decade of the 21st century, restrictions on the resources used were practically lifted and the opportunity opened up for the convergence of technical systems and business management systems, with analytical systems playing a decisive role in this unifying process, which became the operating node of the feedback loop. Using analytical systems, a person can manage both business and technological processes. This is why the coming era is referred to as the “feedback economy.”

Even before the current explosion of interest in analytics, the concept of the real-time enterprise (RTE) emerged and became a reality with the advent of new application integration technologies—service architectures, enterprise service buses, and complex event processing systems. While real-time analytics was initially expected to be a critical component of RTE, the main impetus for the adoption of analytics has been the increase in data volumes.

Case in point universal use Analytics serve as a means of automating oil production, where it is necessary to combine the management of drilling equipment with an assessment of the market and economic situation; here, technological and business management form a single whole. To exchange data on wells over the Network, the Wellsite Information Transfer Standard Markup Language (WITSML) was created, acting as a link between two types of control systems, and today the possibility of its use in the energy sector, transport and other industries is being considered. The main condition for the applicability of such systems is the use open standards for data exchange between geological and technological research systems, drilling process control systems, geophysical research systems and modeling and data analysis packages.

Data Discovery

Until recently, the entire discipline of business analysis was built on the basis of structured data, but with the advent of the Big Data problem, the need to work with unstructured data, which makes up over 80% of all stored data, has become more acute, and their number is growing an order of magnitude faster compared to structured data. But the point is not only that the analyzed data turned out to be more diverse - after the shift in emphasis to unstructured data for analytics, there is a completely different prospect than before for extracting useful, but for the time being hidden data in the flow of information from a wide variety of sources. In other words, the paradigm is fundamentally changing. Previously, the function of analytics was reduced to obtaining limited number answering pre-set questions and creating reports, and now analytics is becoming an interactive process of human communication with large volumes of data through the means of turning raw data into useful information. It is noteworthy that powerful and expensive equipment is not necessary for this purpose.

Most often, modern analytics is associated with two related areas - Data Discovery and Visual Analytics, but the use of two different terms for almost the same thing confuses the already not too clear and oversaturated with names picture formed by technologies related to analytics in one way or another. To indicate their place, it is necessary to attempt to classify business analysis technologies, to divide them, for example, into three large, partially overlapping groups:

technologies for extracting information and knowledge (Information and Kowledge Discovery): OLAP, Data Mining, Text Mining, Web Mining and various search engines;
decision support systems and intelligent systems(Decision Support and Inrelligent Systems): conventional and group decision support systems, applied methods artificial intelligence, expert systems, predictive analytics;
visualization tools: visual analytics, various kinds of “dashboards” and remote controls.

Formally, Data Discovery (discovery of hidden meaning) clearly falls into the first group, and Visual Analytics into the third, but they have a lot in common - the emphasis is on human creativity, supported computer technologies. It can be assumed that Data Discovery is more general concept, these are many methods of working with data, and Visual Analytics is rather a tool (so far the only one). Apparently for this reason, recently both directions began to converge, and the common name Visual Data Discovery appeared. Systems of this class make it possible to turn a computer into a tool that enhances the user’s intellectual potential. VDD is also called exploratory or investigative analytics. The first name was proposed in 2007 by data visualization specialist Stephen Few, who understands by advanced analytics the search for meaning hidden in data. The second was coined in 2011 by renowned analyst Kurt Monash to refer to analysis performed to find something previously unknown in existing data: hidden images, patterns and other content that can be used to make decisions.

DD tools are more democratic, they are designed to improve the efficiency of working with data regular user consuming information, and not a data scientist who has access to complex analytical systems and mathematical methods. At one time, mainframes were available to select professionals, and the advent of the PC made computers accessible to everyone, the advent of DD can be called the consumerization of business intelligence.

As at the dawn of the PC, the leaders of the new trend were not the leading players in the business analytics market, but relatively unknown companies: QlikTech, Salient Management Company, SiSence, Tableau, Tibco, followed by trying to keep up recognized grandees: MicroStrategy has improved Visual Insight; SAP, SAS and IBM launched new products Visual Intelligence, Visual Analytics and Cognos Insight respectively; and Microsoft supplemented PowerPivot with a product with Power View.

Arthur Conan Doyle endowed Sherlock Holmes with the ability to draw conclusions based on a small number of observations, but if desired, the same facts can be interpreted differently and the conclusions will be different - the weakness of the deductive method used by the great detective is in its limitations. But if there is enough data and adequate tools are available, then objective and well-informed decisions can be made - the accuracy of their adoption depends on the volume of data and the quality of the tool. The secret to DD's success is simple - these technologies allow you to make decisions based on a larger volume of data.

The goal of DD systems is to create tools that are people-oriented, support them in their work, and provide people with the opportunity to use their intuition when identifying useful and meaningful information from the available data. In this case, the following cycle develops: first a request is formulated, the answer to it is most often presented in graphic form, therefore, in DD, visualization technologies are of particular importance, then the results are evaluated and, if necessary, the cycle is repeated. The difference from traditional business analysis is the clarity of the presentation and speed - a person would like the operating mode to be close to real time.

Visual Analytics

Let's start with a terminological clarification, since there are three similar terms: Visual Analysis, Visual Data Mining and Visual Analytics - but each has its own meaning.

Visual Data Mining was the first to appear - approximately in 2000, the integration of Data Mining methods with visualization technologies began to be called in Europe. One of the first scientific schools in this field was formed at the University of Koblenz (Germany).

The advent of Visual Analytics (Figure 2) dates back to 2004 - it was proposed by Jim Thomas, one of the authors of the book "Lighting the Way, a Visual Analytics Research and Development Agenda", released as a policy document of the Pacific Northwest National Laboratory, one of the sixteen laboratories US Department of Energy. The book defines VA as the ability to think analytically, supported by graphical interface. The laboratory was reoriented to the VA by order of the US Department of Homeland Security as part of the program to counter terrorist threats and has now become the largest research center, almost entirely specializing in various aspects of the VA, with branches in a number of countries; several years ago, the laboratory of the university in Koblenz became part of it.

Visual Analysis has at least two interpretations. According to the first, this is the name given to a wide range of interdisciplinary research, one way or another related to the use of interactive visual aids for data analysis. The second value applies to the actual analytical component of Visual Analytics, which consists of automatic and interactive parts.

Analytical tasks are part of the fastest growing of all areas of IT application - automatic analysis tools can solve some of these problems, they are applicable in cases where the data is strictly formalized and there are reasonable criteria for comparing them. But in real life and in real business such restrictions are rarely observed, so data visualization methods come to the rescue. This may be the most unexpected data - for example, the field of text data visualization is now actively developing (the ability to make a reliable judgment about the content of documents without resorting to reading in detail). Visual analytics lies at the intersection of these two areas.

Although many methods for automatic data analysis have been developed over the past 15–20 years, it is unlikely that all analysis can be transferred to a machine; most likely, analytics will be built according to the scheme shown in Fig. 3. First, automatic analysis tools are used, adapted to a particular field of application, they generate reduced volumes of data compared to the original ones, and are applied to them Data methods Mining and a model suitable for further research is created. Visualization and interactive technologies allow a person to analyze and improve this model. Most of the actions are performed in iterative mode successive approximation to the desired result. In practical implementations, not all components may be present (Fig. 3), but an iterative closed feedback loop will exist in any case.

The main advantage of Data Discovery and visual analytics solutions is that they enable a person to extract from the data what is hidden in it, or to “talk” the data, as stated in the book Probability by Frederick Mosteller: “It is often heard that the data speak for themselves, but their voice is too quiet and modest.”

Data is the new world currency, and the Internet is the exchange office. IN information flows it's easy to get lost. How to present data to attract attention? The user will understand a table of ten rows and three columns. What if these are thousands, hundreds of thousands of lines?

Emphasize key points and patterns that cannot be found in dry numbers, using infographics, interactive widgets on the website and dashboards. Visualization tools will help with this.

See and use the 21 visualization tools in this article.

Services for creating charts and graphs

Infogr.am

Service for creating interactive, embedded visualization. Over 30 chart types and six design themes.

Enter data in the editor, upload it in a table or text format(for example, csv) or import from the cloud. Next, choose what you want to get - a diagram, graph, map - and the final format.

You can embed interactive visualization on your website using an iframe, as well as download your diagrams in pdf or gif and insert them into a presentation.

The starting tariff is free. Limitations on the size of the library, the number of templates, and the ability to download diagrams as images. But this tariff is more than enough to design one small website or online presentation. Paid tariffs start at $19 per month.

Creately

Tool for collaboration, which allows you to create interactive diagrams, Gantt charts and simple infographics. The arsenal includes more than 50 types of diagrams with thousands of samples and available shape libraries.

The interface is reminiscent of Microsoft Visio, and no wonder: Creately performs similar tasks.

Results can be imported to websites, mobile applications, and saved in PDF and SVG formats.

The main disadvantage: all versions are paid. Online service - from $5 per month. The desktop version will cost $75.

Fusiontables

Google tool for creating detailed charts, graphs and maps.

Helps to collect and visualize data and display it on the website.

You import data from a file on your computer (virtually any tabular format), Google Docs spreadsheets, or create a new spreadsheet. You can search for information from external sources.

Cons: complex interface, no design features.

Pros: Collaboration since all data is on Google Drive.

The tool is used by The Guardian, which is already a good recommendation.

VisualizeFree

The visualization can be exported in Excel, PowerPoint and PDF format.

The service is free. An interesting feature: although you can use Flash when developing, the result can be saved in HTML5 format.

Tableaupublic

Free interactive data visualization system.

You import data (almost all common formats and many services are supported), and get visualization for posting on a website or social networks.

The service provides 10 GB for storing projects on its servers.

Javascript libraries

Dygraphs

An open source library for creating diagrams. Helps explore, process and interpret large data sets. Flexible settings.

Disadvantage: This is more of a developer's tool than a marketer's tool. Its functions are only to create graphs and diagrams. Requires skills in working with javascript and a manual. However, there is nothing complicated here even for a beginner.

D3.JS

The library works with data-driven documents using html, svg and css. Imports data from csv files.

It is also suitable for simple business visualizations, but its main purpose is scientific information.

Leaflet

Free javascript library for creating mobile interactive maps. Use it if you do not want to embed Google Maps and Yandex maps on your website or mobile application.

Main advantages: the library is extremely lightweight, plus it works without an Internet connection.

Highcharts

A huge number of options and chart types. The output format is SVG or VML for older versions of Internet Explorer.

Supports animation and real-time data streams.

Free for non-commercial projects.

Chart.js

Open source library. 6 types of diagrams, which will more than cover the needs of a small project, but not enough for a professional in statistics.

Supports HTML5 Canvas and creates responsive animated charts and graphs in a flat design style.

jqPlot

Plugin for creating simple graphs, line and pie charts.

Easily integrated into the site. Easy setup, attractive chart design. Supports animation, no interactivity.

A good free solution for simple tasks.

ZingChart

Uses HTML5 Canvas, while supporting Flash. You have hundreds of chart types at your disposal to visualize your data.

The free version is enough to solve most problems. Visualizations are signed “PoweredbyZingChart”. The most popular, according to developers, “SaaS” tariff will cost $1999 per year.

JavaScriptInfoVisToolkit

Modular structure. Adaptation for mobile devices. The user downloads only what is needed in a particular case.

There is a set of unique design styles and animation effects.

Services for creating infographics

Pictochart

Offers many free and premium templates for infographics, presentations, reports and reports. They all look impressive and are easy to use.

You can use everything you need for an attractive presentation: charts, graphs, photos, videos, icons, etc. A simple built-in editor: choose one of six hundred ready-made templates, upload your own files, drag and drop objects onto the workspace.

Import data from Excel, Google Docs, and online survey service SurveyMonkey.

Easy.ly

Easy to use tool with hundreds of templates. If you want, start from scratch.

On the top panel, select the object of interest (icons, clipart, diagrams, etc.) and drag it to the desired location. You can embed video.

The starting tariff is free; for additional “cookies” the service asks $3 per month.

Business intelligence and dashboard platforms

Sisense

Allows you to combine data from multiple sources and look for new ideas through visualization. For creating presentations only, Sisense is not the best choice. But if you need to quickly create your own dashboard with many widgets, this is it.

The developers assure that even an untrained person will cope with the task in an hour and a half.

Short free trial period. Rates are available upon request.

Nuvi

The service is paid, tariffs are determined by specific tasks. The owners claim that they are open to non-profit projects and startups, so you can bargain.

Bime

Cloud platform for visualization of BI analytics. It takes source data from contextual advertising systems, CRM, web analytics systems, social networks and other sources. Unfortunately, Russian services are not supported out of the box. We need specialists in IT integration.

Transforms data into powerful visualizations that look good on any display, regardless of platform or size.

Free trial period for 2 weeks. The basic rate is $490 per month.

iCharts

Cloud BI visual analytics platform for business. Shared access to a control panel with widgets in real time. The results are easy to implement, share and view.

Integration with Google Analytics, Salesforce, Zendesk, Azure, AmazonAurora, GoogleCloudSQL, GitHub and dozens of other services.

The platform is paid, the starting tariff is $460 per month.

Plotly

A platform that creates both individual charts and graphs, as well as presentations and full-fledged dashboards.

Works with JavaScript, Python, R, Matlab, Excel. Import data from multiple sources (MySQL, Microsoft SQL Server, Redshift, Excel, etc.).

There is a free version. Paid plan - $33 per month.

DataHero

This cloud platform allows full-fledged work without resorting to IT specialists. Import from cloud services.

You can create separate widgets and dashboards.

Push-button and drag-and-drop principles: no need to waste time mastering the interface.

There is a free plan. The service provides each new user with a free trial period of 2 weeks with the capabilities of the premium tariff. Paid plans start at $59.

Conclusion

Of course, most routine tasks of data processing and visualization can be solved the old fashioned way - in Excel, Adobe Illustrator and Power Point. However, the power of modern technologies allows us to go further, even without specific knowledge and skills.

All services work on any platform and in any browser (except for older versions). You don’t need to be an IT specialist or designer, or install Java, Silverlight or Flash.

We hope you will not only select visualization tools for your tasks, but also learn new things about your data.

My name is Anna Ponomareva, I am a Game Analyst at Plarium Kharkiv. This year, our department launched the mobile RPG project Stormfall: Saga of Survival and is now supporting it. My task in the project is to quickly analyze game processes and provide reports to correctly adjust the balance.

Visualization is one of the most effective tools for presenting data. It allows you to visually track the main trends, dependencies, distributions and deviations of the studied characteristics, find correlations of indicators, evaluate changes, and help make the right business decisions.

It is important not just to show information, but to do it as clearly and transparently as possible for the customer, highlighting key indicators. I will try to tell you exactly how to do this in the article.

What not to do

Often, in an attempt to squeeze everything out of the available information, we can end up with something awkward and frightening.

Or worse.

Remember that too much is not good. At the same time, looking at the dashboard, the user must find answers to all questions regarding a specific analysis. It won't do any good if answering one of the questions requires collating information from ten tabs.

Data Analysis Stages

Before we begin visualization, let's consider all the stages of data analysis.

Statement of purpose

Each study should answer a number of questions posed - there is no need to produce studies for research.

Data collection

At this stage, the analyst either works with already collected data, or participates in the process of setting up a data collection task (in fact, he decides what information he needs and in what form).

In the first case, special attention should be paid to the correct interpretation of the data recorded in the database, and often come to terms with the existing data format, table design, etc. In the second case, the analyst is faced with the problem of constructing a competent data collection scenario - he may especially overdo it in planning A/B tests, event logging, etc. Communication with programmers is important here, who can help in understanding the processes and assessing the scale of planned recordings.

Data preparation

“Garbage in, garbage out” is a rule that should always be remembered. Structuring, eliminating errors, changing content formats, parsing anomalous results, clearing outliers, eliminating duplicates, integrating data from different sources are some of the most important points in data analysis.

Sometimes it is necessary to expand the metrics, such as adding computational information (gain, rank, number, etc.). Sometimes it is necessary to reduce the number of attributes (variables) or switch to auxiliary variables that take one of two values: true (1)/false(0).

At this stage, raw data is transformed into useful input information for modeling and analysis.

Data Exploration

To correctly interpret multidimensional data, it is necessary to look at them in terms of both a specific characteristic and a group of characteristics. You should also present key indicators in dynamics with plans and actual results. It is at this stage that the format of future visualization is selected.

Visualization and drawing conclusions

Every study must end with results and conclusions. Even if they are negative, they are worth talking through and discussing. At the same time, the correct formulation of the problem, the methodology for collecting data, the correct interpretation of the results, identified errors and much more should serve as the basis for subsequent research.

The time an analyst spends on each phase depends on many variables: from work experience and level of data knowledge to the list of tools used and PC specifications.

It is also necessary to understand that the process of data analysis is iterative in nature and can be represented as a cycle.

To formulate final conclusions, it is sometimes necessary to go through the cycle several times. Each time, refining the data, rebuilding data processing and presentation models, gaining more and more new knowledge about the analyzed entity. To reduce the number of iterations of this cycle and the time spent on analysis, the analyst must not only be a Big Data specialist, but also have a good knowledge of his subject area (be it game mechanics or the Internet of things).

At the same time, only experience helps to understand what data and analysis methods are needed in each individual case and how to interpret the results.

Selecting charts for visualization

To choose the right diagram, you must first formulate a goal: what exactly it should show. If there are multiple goals, it may be worth creating a separate chart for each of them.

Gene Zelazny, in Speak the Language of Diagrams, writes that (almost) every idea can be expressed using a comparison. All you need to do is define the type of data comparison:

component-wise: percentage of the whole;
positional: relationship between objects;
temporal: changes over time;
frequency: number of objects in intervals;
correlational: dependence between variables.

If we classify by objects, we can distinguish the following types of visualization:

deterministic dependencies: graphs, diagrams, time series;
statistical distributions: histograms, scatterplot matrices, box-and-whisker plots;
hierarchies: connection diagrams, dendrograms;
networks: graphs, arc diagrams;
geovisualization: maps, cartograms.

To make my work easier, I use Andrew Abela’s diagram selection diagram, which I made some edits and adapted to my needs.

Andrew Abel's original diagram can be viewed.

Choosing a diagram using a specific example

I needed to create a dashboard to control the income/outcome resources that players mine, find, or receive as rewards. This was quite a complex task, requiring to put together several tables, take into account many nuances and not lose sight of a single resource source/consumer.

First, I described how to collect data, what indicators need to be calculated and how exactly to do it, what the visualization should look like. Then specialists from the Game BI Department (responsible for the database for all our projects) checked the correctness of the collection algorithm, optimized the query and created a dashboard according to my design.

To select the chart, I used the technique described above.

We compare 2 characteristics (the amount of resources received and expended) - which means we use a bar chart.
We have several sources for obtaining resources and ways to spend them - so we add a structure to the bar chart (we designate each source and consumer with its own color).
We track how the income and consumption of resources changes depending on the game day - we set the corresponding parameter for the horizontal axis.
For easy reading, all income actions (receipt) are displayed on top of the horizontal axis, and outcome (expense) - on the bottom. This allows you to visually assess the magnitude of the difference.
To make it clear in which periods of a player’s life there is a surplus and in which there is a shortage of a particular resource, we superimpose a line graph on the bar chart, which visualizes the calculated difference field.

An example of an identified item surplus (line graph above the bars).

An example of the life cycle of a resource and its transition from a surplus to a scarce category (line graph below the horizontal axis).

As a result, our diagram shows the inflow and outflow of resources on different game days and demonstrates the presence of a shortage or surplus of resources.

We also wanted to see how resource movement is affected by game balance adjustments. To do this, we supplemented the dashboard with a diagram of changes in the structure of resources over time, where the date was now selected on the horizontal axis. So we ensured monitoring of the life cycle of resources in several characteristic sections: the life cycle of the player and in time.

Periodicity

Most research findings become irrelevant over time. Some data must be updated daily, some must be monitored monthly, etc. It is necessary to immediately clearly classify the signs in terms of update frequency:

requiring continuous monitoring (standard indicators of retention, conversion, etc.);
data whose dynamics should be assessed periodically (results of A/B tests, dynamics of a set of levels, etc.).

Again, there is no magic pill in this task, because the signs that need to be tracked are very specific to the subject area (the game genre, implemented mechanics, etc.).

Automating this process is still problematic - developing tools for continuous monitoring of absolutely all indicators will not lead to anything good. As we have already said, processing large volumes of data includes the stages of collecting and primary processing of information, loading into storage, data analysis and providing results in a form that is easy to understand. The requirement for real-time processing makes the task even more complex and resource-intensive.