Business Operations Through IT Automation

Today’s business need to easily adapt in changes to maintain business continuity. We must use our human resources and IT resources so we can handle the situation.  Automation is one to use so we can quickly do changes in our operations.  The IT and Business units are two separate departments in the past that go in different directions and control.  Most of the time businesses demand the now and the IT demands the future and adaptability.

The division of business and IT(Information Technology) makes it harder to implement technology.  The thinkers of the industry came up with a solution to marry business and technology.  The first that we’ve seen so far is the DevOps revolution.  So what is the DevOps revolution?  It is a term coined that merges both the Development team and the Systems Operations team in the IT department.

Developers mostly write codes for the business requirements without. They think that they should just deliver on time without consider the infrastructure.  On the other side, systems administrations at systems side will not let developers’ codes to hit the production right away.  Then the introduction of the DevOps revolution is implemented so that the two teams will work together to implement understanding and harmony in deployment of systems.

In order for this to succeed, the systems operations write code hand in hand with the developers in order to make room for the voice of the developers, which simplified the interaction, and time to release production systems.  Businesses then reap the rewards of having their operations run smoothly.

The agile methodology was then adapted in order to make meetings fast and easy.  Automation came into play so that the systems can be reproduced easily and that even developers can deploy their own systems from templated servers written by systems administrators.  DevOps is able to merge the two teams that are separate before and the need for the agile master / scrum master is needed.  With that, extreme programming or pair programming have been implemented to aid the exchange of knowledge and skills development of two separate teams.

Opscode chef, ansible, puppet, salt and many more are some of the tools of IT automation that helps deliver systems that are automated by DevOps or systems administrator.  With the help of developers, and quality assurance team, the environment to test and run the code can be properly maintained.

Example implementation:
Chef Development Kit in Windows Server 2012 R2 using

Code at:

Finding the Unicorn Data Scientist

Nowadays, as more shifters or career shifters are around there are people who declared themselves as Data Scientist.  These abundance are somewhat we can call as raw skills.  They may possess skills at some level but it is not yet enough.  In order for us to beat the uncertainty, let us see what it takes to be a data scientist. In order to be a Data Scientist, there are 3 major skills, the Computer Science / Hacking skills, Mathematics & Statistics skills and Substantive knowledge like research, business, medicine, biostatistics and other domain skills and knowledge. Machine learning also come into play which is part of Artificial Intelligence, it is where we model the world in order to be able to forecast and predict information.  Knowledge of programming, especially functional programming, database systems and big data is necessary for the technical skills. Statistics and Mathematics skills and knowledge are also needed in order to be a Data Scientist to have the expertise on looking and solving problems and expressing them in figures and numbers.  With these the data scientist can model the world in numbers or they are able to quantify the world that we are living in in order to infer analysis. Substantive knowledge on the other hand will come into skills with research, business, biostatistics or medicine and other fields which may need some knowledge of the problem being observed and analyzed through the use of data. Other skills needed for Data Scientist is the ability to present his work to non-technical people and visualize the findings so as to make the relay of information in simpler forms so that everyone can get a grasp of the information being communicated. Unicorn Data Scientists are those that have accidentally gained different skills.  In my case, I am a certified IT engineer, have been skilled in Marketing and Digital Marketing, took up Master’s Degree in Business Administration (MBA), been doing Solutions Architecture (Cloud) and other sort of researches like in the biotechnology research on BioPhotonics, bioinformatics and more.  A vast array of experience and exposure to different fields and being able to integrate or having a look at different stages of the industry, business, medical field, sciences, arts and technology is needed in order to be a Unicorn Data Scientist. It is somewhat hard to find them, and most them are either Master’s Degree or Doctor’s Degree (or PhD) holders.  They have a mixed nut in marketing and economics too, and have been around in Research and Development projects, some have experienced multi-shifted career experience, and others are life-long learners that have immersed oneself in Computer Science, Math & Stats and Substantive Knowledge. Unicorn Data Scientists are products of experience, sometimes not of a well lined career path but a path wherein these people have skills in mixed and different fields.  Most of them are fast learners and have even looked into Systems or processes which is one of the applications of data science.  Even in business processes not just data can we apply data science and engineering. If you are lucky to find a Unicorn Data Scientist, be sure to take care of them as they are mostly rare and hard to find.

Getting and Cleaning Data – A Data Scientist’s Perspective

Getting and Cleaning Data – A Data Scientist’s Perspective

Getting cleaning is easy with the available tools that we have right now, and the available technology on sharing datasets. We have tools such as git, svn, zip file and more. However, although these tools are available, they are to be installed on the machine in order to read data.

Most resources that we have today are mostly compatible, but if you are to get data that is large, or something that is in the realm of security, you will have to authenticate to access data. After having the data, you will then need to inspect if the corresponding data dictionary is available, or the data was labeled if it is a csv file. There are cases where data is labeled on a per item per field, which is easy to read in some quick linear view, without having to look at the header, you will be able to know what the data is. The hard part on this that it will give you a toll later is, you will have to clean the label part. For example this line:
name: bob pet_type: cat age: 3

You will have to clean up by removing first the label, there are cases that you will have to transform the cleaned data to make use of it at later stage of analysis.

If you will write some function to process data, make sure to use low-level tools and functions. This will greatly reduce your time to process data. Also it is better to have a separate script or tool dedicated on the cleaning part, before analyzing data to save time. After that you can then create another intermediate data for use on the visualization part.

At the end of your analysis, you should be able to present your analytics report in some form such as a PDF or a presentation available to your audience. When writing reports, make sure you assume the audience is not technical and needs an easy to understand format and that there is adequate data for seeing that same insight that you want to communicate.

ScienceOps or SciOps Tasks

Helping other businesses grow in terms of technology is mostly a need, in some start-ups and also to the growing business that wants business continuity.  Technology nowadays makes creation of services delivered at a faster rate due to the services that we have like Cloud services.  Amazon Web Services, OpenStack and many more have offered their infrastructure as a service for those technology business to help deliver high quality applications.

Users of the apps are now mobile on their laptops, they are now on smart phones, laptops, tablets, and many other gadgets ready to run apps.  Also, Apple devices are around and all those user apps are best paired with data on some server, the cloud.  The cloud works as an authentication, knowledge-base and even the data storage of these infrastructure.  Regardless of the application a good system and network infrastructure is best to have.

These infrastructures are built on top of on-premise computers/servers and even with the cloud.  Cloud services reduces costs is such a way that you don’t have to spend time waiting for arrival of servers, you can scale up and down and you can even scale automatically, yes automatic scaling.  But what is automatic scaling?  Your infra if properly managed can be designed to grow its instances/servers when there is a high need of resources and to shrink down in numbers when only few instances are needed.  All these are beneficial.

With all of these things, we have infra security.  Securing, maintaining and keeping up-to-date with the latest software, fixes and patches are needed on your cloud infra.  However, we are not sure where we are doing right or where we are missing if don’t audit.

Designing a good secured infra needs some more.  We have to monitor things in order to know downtimes, also we have to measure.  In measuring the infra performance we create metrics that checks the status or measurement of every parameters for the server.  We then have to collect this and increase the capability to analyze it.

The data scientist for this work are ScienceOps. They are mostly building good infrastructure, but that is not all.  Even after seeing what’s good in your infra, you will need to know what’s in for your business.  ScienceOps might need some background in business, marketing and strategic planning, and also the apps that you are targeting for measurements.  SciOps will then need to create analysis of your users data aside from the server logs.  With the users data, they can show you how you are fairing with the market.  They can show you if you are growing over time or just having stagnant users no longer interacting with your app.

Your marketing success can be measured and a lot more on the choices of products, services and behavior of your customers or users of your app.  With proper metrics, right analysis and more insight gains, the management team and product team can design, update and direct the product / service that you have in the right direction that satisfies your customers and service users.

5 D’s of Data Science

Here are the 5D

5 Ds of Data Science

  1. Data
  2. Digitalization
  3. Description
  4. Depiction
  5. Discovery



In data science, the most needed is the data, the observations or examples.  With this, we can describe how much, how strong, what are the value or measurement there is about a situation or a thing.  Data existed when define the description of an event or if we measure something.  This is the most important building block that we need to have in doing Data Science tasks.  With data, we are able to show quantity and quality, and this will be the basis of our equations and statistics.  We observe or sometimes use instruments or probe in order to gather data for our analysis or research.



We cannot process raw data when it is not digitized or put into a computer system or encoded into forms that can be processed.  The format is not limited to text, graphics, spreadsheets, vectors, audio, video, we can use any digital format that we like.  Through digitization, we can speed up the process of analysis and procedures being applied to gather the measures in statistics.  We can then infer from the findings of things, and we can create more insight.  Digitization makes the sharing of information easier as the data can be stored and retrieved for future use.



Through the tools that we have, mathematical equations and statistics, we can describe the data that we have.  We can determine if assumptions are right or wrong through hypotheses that we formulate.  We can then deduce from what we have gathered, and those will help us understand more, and can guide us on the next steps on what we can do with data in order to solve a problem or understand a situation or use it to teach machines/computers. These machines in return will be put into practical use which can aid the human ability in different aspect of our lives, not limited to traffic, medicine, marketing, economics, planning, production, operations, understanding behaviors and many more.



In Data Science, where use to do machine learning, we mine information, create training and testing sets, we can then depict or predict the future.  Also with visualization, we can explain what we have just found out through insights.  We can share the information available for consumption at a wide range of audience from academe, profession, medicine, science and the like.  With depiction/visualization we can help different people understand what we have just found out.  This is where data science becomes an art, a place of creativity and targeting with mass consumption.



At the end of most research of a Data Scientist, a discovery from different insights is mostly been found or through the process clarity comes as the prize of hard work.  The discovery from the tasks conducted can help to predict reality, give warnings and inform the people.  Most stakeholders are the pharmaceutical company, doctors of medicine through BioStatistics and analysis, and some business or entreprise.  The information uncovered can be a great help in making future decision on improving medicine, process, product or strategy such as those used in marketing campaign, designing educational things and also providing new products/services for the benefit of the people.

Machine Learning, A Look in the Past

Before the Big Data become popular, there were at the back of Web 1.0 the machine learning of the past which utilizes Market Basket Analysis. These are very dominant in advanced e-commerce stores and online shops. The Job sites also utilized these technology before, and how did they implement it? Cookies, not those in your kitchen jar, but those text files that remembers your preferences, your visited sites and the things that you’ve clicked on the internet.

And what was that? Machine Learning, a part of the task of a so-called Data Scientists of today. Facebook analyzes all of our likes, shares, streams today, Twitter can also do it, I have even tried to do sentiment analysis of tweets using python. Google with their intelligent algorithms, Yahoo the early adopter of Hadoop for HDFS (a Big Data System). A lot of other database management systems like SQL are there used widespread. In those days, MatLab is a mostly used software, SPSS, SAS, S-Plus, and now R. Nowadays there is Pig to simplify MapReduce, the language for Hadoop management.

But who are those that have benefit from data science in the past? Amazon, the online book store have utilized data science, data mining, data analysis in order to show you the most relevant product that you can buy, they are now an online store and have even adopted into Cloud Service Provider company. Their algorithms can help upsell and show you related items to what you have already bought.

The most successful in utilizing BIg Data and Data Science is Walmart, they know how much to display on store, they know how much to carry on their inventory and they even know when you will buy your next coffee beans, sugar and even the infant milk and cereals that you consume and buy on your scheduled shopping. The likes of forecasting sales, that is why Walmart grew because of this so called business intelligence, it is data science, they use algorithms, mathematical equations, operations research tools in order to manage and understand the consumer behavior.

So the realization of Data Scientists today are thing of the past, but now, a successful e-scientist must have the skills in diverse fields (multidisciplinary-skilled) like business / marketing, economics, mathematics, statistics, operations research, some IT skills, big data and creativity. Yes, creativity, without it there will be no spark of wisdom, and this is mostly part intuition, insight and looking the world/data at a different angle to predict, to deduce and to induce.

Becoming a Data Scientist

Most probably most of you are looking into becoming a data scientist or e-scientist.   With the advent of technological advancement the way we manage data is now digital, we use computers and large storage systems to store the data that we have.  In an era of information we are very well informed that Big Data or the large data are now handle by servers in the cloud or a cluster of many computers storing and processing data.

Many of those in the BioStatistics field, Informatics, Statistics and Mathematics have the edge on the core part of the field of Data Science.  Numbers as we know are quantitative descriptions of our environment and the world we lived in.  We also used to quantify qualitative data as it was held true in the past and suggested so that we can apply mathematics in everything that we do.  Being a data scientist is a part of work wherein you have to have skills in statistics, a little of basic mathematical foundations and also the love of insights and intuition.

The creative part of being a data scientist is on the insights, data exploration and intuition.  You cannot explore an unknown data without being creative and that is part of which tells that data science is.  In data science, you are a scientist dealing with data and have the goal of achieving insights or ideas from the given set of information.  The hard part there is being able to clean out the bad part of the data and making it neat so that you can further process the information.

Also programming skill is needed, which will help you to automate some parts of your work, like applying functions or summation or complex formulas to be applied to a million data or your big data.  You have to be able to be familiar with Information Technology which is, most of your tools in e-science or data science will involve working with both commercial and open-source statistical software, programming languages, database systems and other storage systems that handles massive amount of information.

We aren’t done yet, you must be versed in systems or the topic or field you are doing research.  Most of it will not be limited to genomics, linguistics, disease prediction, medical field, and many other fields wherein you are also asked to predict.  To predict properly you must have understanding of statistics and machine learning which will give any system with the power to be an artificial intelligence power house.  Most of the current big data that we have can be used to power new robots connected to a cloud powered computer which are the works of data scientists on super computers.

Business skills is also part of data science as this will relate more to visualizations and also the profit for the stakeholders supporting your work as a data scientist.  Overall, data science is a diverse field wherein a mixture of skills is needed.

This article is helpful for looking into yourself of what type of data scientist are you, or are you a data scientist with a future since you have the basic skills needed to be into the world of data analysis, mining and science.

I fall evenly with the skills of a Data Scientist.

Related article:

analyzing the analyzers