Categories
Data Science Machine Learning

Understanding Data

Understanding Data is as much science as it’s an art. The science aspect provides the tools to query, slice-and-dice and visualize the data. On the other hand, the art aspect comes in asking the right questions and hypotheses using domain knowledge. It’s undeniable that people can make sense of data in a much better way if it’s presented visually, rather than just a bunch of numbers, formulas and complex equations. Understanding Data is the first step of a larger Machine Learning process, it’s impossible to move forward with any analysis without a very solid comprehension of the data.

Understanding Low-Dimensional Data

The first step to understand the data, is to get some Descriptive Statistics measures such as mean, median, variance and standard deviation. These statistics are very useful to understand the data positioning and variability. After the knowledge of these descriptive measures, it’s time for Data Visualization. One of the simplest plot that can be done is a Scatter Plot. In general, the horizontal axis of a Scatter Plot shows the data index and the vertical axis shows the frequency or ratio. However, there are plots that can give way more information than the Scatter Plot. There are other types of plots that show more information about the data. For instance, the Boxplot displays the quartiles, median and outliers of data. One graph that is awesome for Understanding Data is the Histogram. This graph divides the data points into bins and that makes it possible to visualize the Data’s Distribution.

Histogram displaying Life Expectancy of multiple countries in 2007

Understanding High-Dimensional Data

Sometimes the data is very high-dimensional, which introduces some challenges. It’s very hard to visualize the data in many dimensions, so the best way to resolve this situation is to capture the information in high-dimensional data and map it into lower dimensions. To accomplish the task, Principal Component Analysis (PCA) algorithm is very helpful. This technique relies heavily on matrices and vectors, topics that are part of a branch of Mathematics called Linear Algebra. However, what this method really does is not so difficult to understand. It finds the lowest-dimensional data among all possible things, so that variation in the data is the highest. Basically, PCA transforms high-dimensional data into lower-dimensional data, while it still captures the information of the original high-dimensional data. With PCA, the analysis can focus only on variables that make significant impact on the data, dismissing those that are not important.

Plot of variables’ contributions on extracted principal components

Cluster Analysis

One crucial thing to analyze when dealing with high-dimensional data is the distance between points. These distances between points are very useful to divide the data into groups. This division makes it easier to understand, visualize and get some insights about the data. There are some algorithms that divide the data into clusters and each algorithm achieves that in a different way. The most intuitive example is the K-means, an unsupervised learning algorithm that gets k as parameter and tries to find k natural groups of data based on the means of these groups. Cluster Analysis not only helps Understanding Data, but also is very good way to identify patterns within the data.

K-means algorithm with 3 clusters (k = 3)

4tune

At 4tune, we are passionate about solving clients’ problems to deliver profit. If you have a business idea or need help with anything related to data science or artificial intelligence, we are glad to help. Please contact us at victor@4tune.ai

4 replies on “Understanding Data”

hello!,I really like your writing so a lot! share we keep up a correspondence extra about your article on AOL? I require an expert in this house to unravel my problem. Maybe that is you! Having a look ahead to see you. Florette Nahum Merrilee

I am no longer positive the place you are getting your info, but great topic. I must spend a while finding out much more or working out more. Thank you for magnificent info I was looking for this info for my mission. Andrea Morten Borchers

I?d need to check with you below. Which is not something I generally do! I appreciate reviewing an article that will certainly make people believe. Additionally, many thanks for enabling me to comment! Daisy Axel Birkle

Leave a Reply

Your email address will not be published. Required fields are marked *