article-spots
article-carousel-spots
programs
Hard skills
What is a data engineer?
15 Apr 2020

Would it be true to say that a data engineer is an engineer who works with data and creates data products? Yes, but it would be a vague answer. To figure out what data is and what a data engineer does, we’ve talked to Raman Novik, Senior Solution Architect, and Hanna Petrashka, Senior Resource Development Lab Head.

Such popular media as The New York Times, The Economist, WIRED, and others have long made data a trend and declared them the new petrol. Just as petrol, data is useless in a raw state. First, you should convert petrol into fuel which is still quite useless by itself. Second, you need an engine to get energy from the fuel. Your data product is the engine. 

What do we mean when we say “raw”? A system receives data as a transaction block or a dump of medical records. It’s very difficult to use them in this state. A data engineer not just collects data but transforms them into the information that an end user can comprehend and manipulate using something as basic as Exсel. But it’s not a data product yet. Information should provide real benefits to its users. Even a masterfully processed piece of data adds no real value without special tools to use these data. Only together, a tool and refined data, lead to actual results. 

For example, we’ve collected some data on meteorological observations over a specific period, processed it, and put into an Excel table. If you want to get a data product based on this information, we should analyze temperature spans and start making 24-hour weather forecasts which will affect users’ decisions. To make profit, it’s important to know the validity of information. Any model can provide incorrect results, and we should be aware of error margin. For example, our weather forecast is seventy per cent precise. These numbers will allow users to objectively evaluate a situation and make the right decision. 

A data product is the result of the transformation of data into quality information that brings value. 

Every data product has a lifecycle and a value chain. Data engineers work on all stages of this chain which include collection and analysis of data and product development and support. Each stage has its peculiarities that define the specific nature of work. There are many different types of products and their lifecycles in data engineering. That’s why there are so many different types of data engineers: 

  • Data engineer. They are responsible for collecting and processing data, launching processes, and building services to transform these data into a data product. 
  • Data platform engineer. They are responsible for the infrastructure, engineering, security, and monitoring of data platforms. 
  • Data quality engineer. They are responsible for the engineering, analysis, and testing of data. The data quality engineering emerged when it became clear that the traditional testing (QA) could not provide the proper control over the quality of the data on data platforms. 
  • Data DevOps engineer. They are responsible for working with distributed systems and processing complex data in sophisticated environments. They deal with many connections between system components to analyze and solve problems. 
  • Data science engineer. They responsible for structuring and analyzing large volumes of data and performing predictive analytics. 
  • Search engineer. Search engineering is seen as a part of data engineering because the modern search has become so smart that now, it’s much closer to Data Science and working with data in general, than to any other discipline. 
  • Machine learning engineer. Machine learning (ML) is a special branch of data platform engineering. ML engineers should not only work with data but provide the transparency and manageability of the lifecycle of an ML product. The complexity of this process requires an ML engineer to have deeper expertise. 

Career opportunities of a data engineer 

Not all companies produce data products. It’s important to distinguish between web-centric companies and data-centric companies. For example, is Gmail a data product? No. It’s an email service developed to facilitate people-to-people interaction. This service incorporates different data products that are integrated without a notice from users. For example, Gmail automatically sorts letters by important and unimportant. It uses data algorithms to achieve this, but the service can exist even without them. In this case, data products are the improvements of UX. So, we can say that Gmail is a web product, not a data product. If company management are not interested to use data to improve the quality of a product, there’s no need in data engineers. These specialists are welcome in the companies that try to improve the quality of products via personalization, recommendations, and other features that can not only exist on top of data but derive from them. 

Such giants as Google or Yandex will never stop hiring data engineers because working with data is the core of their businesses. The increasing need in data products is the modern trend. Tough competition in the market make companies develop their products by introducing unique features, automating processes, taking care of end users, and others. The majority of such improvements as personalization and recommendations are created with the help of data products. 

Here’s a specific example. FedEx is a competitor of Amazon. Both companies focus on commerce and work with data, but Amazon takes the lead thanks to AI. For example, delivery is done by drones, there is virtually no need in people that leads to reduced expenses. To keep up with trends and retake its position in the market, FedEx plans to digitalize services and operating model and learn to make data-driven decisions. That’s why the demand on data engineers is far from declining. 

Where to study data engineering? 

  • Online courses. The internet has been offering more and more high-quality courses on data engineering, provided by such companies as Google, Microsoft, and others which move the industry forward. More often, they tell about their own data products which limits the studying experience. Great courses on data analysis are rare on the internet. You should know where to look for. 
  • Company-based courses. It’s the most preferable choice. You’ll get the detailed theory classes and the opportunity to put this theory into practice. If you’re good enough, you may receive a job offer. 

EPAM experts are sure that online courses are a great start, but they’re not enough to get employed. Interviewers note that many candidates who have finished self-studies lack any knowledge structure and misunderstand the scope of the responsibilities of a data engineer. The newcomers have limited chances to get a position because employers don’t want to risk. The only solution to this problem is an internship. Try to find a company with solid background in data engineering that will take you under its wing and show you what production work is and how to do it. You should prove yourself good enough to keep working in the company until you become a middle or a senior. With these levels of seniority, you’ll have much easier time getting into interesting projects in any company. 

In EPAM, you can study data engineering in a blended format: 

Step 1. Online training. We provide videos and links to useful materials, lectures, articles, and YouTube channels. You study this content at home, complete tasks, communicate with our experts, take offline theory tests, and break down the most difficult tasks together with your mentor. 

Step 2. Data lab. If you pass the theory tests, you can continue the studies in our lab. Initially, you will simultaneously study theory and do practice tasks using your new knowledge. You’ll learn how to systemize information, build data stores, visualize data, and check data quality. At this stage, each student creates their own project. You’ll also study material in a game mode. The material is dedicated to what a project manager, a delivery manager, and business analyst do on a project. At the last stage of studying in the lab, all students are divided into teams to practice in real project environment. You’ll learn more, improve time management skills, and boost your confidence. 

But it takes hard work to achieve all of this. You’ll take part in daily scrum meetings with a project manager and a business analyst from the client’s side who will provide you with a data set. In a month, you should not only present a ready-made solution to the customer but defend it too. And it’s not even the last trial that the lab has for you. Over a month and a half, mentors will send you different tasks and cases based on their own experience. For example, writing a letter to a customer or communicating with a customer’s representative. Get ready to study in the trial and error mode.

What is the most preferable background for a data engineer? 

Engineering is the ability to find proper solutions in abnormal situations. A person who has an analytical mindset and a technical diploma is more prone to become a back-end developer since no close communication with customers is implied. A specialist with a background in economics and the knowledge of business will soon find their place in a team and will understand a customer. But there are no specific limits and it really depend on an individual and their personal qualities. Soft skills are important too. Sociable and motivated people are more likely to stay in a company. Remember that the more you know, the faster you’ll be noticed, get assigned to a project, and grow professionally. 

Advice for beginner data engineers 

  • Take your time to analyze self-study materials by following the links on the registration page. This will be your first step towards answering the question if working with data is what you’re interested in. 
  • Evaluate your skills objectively and be honest in an interview. Only those who are really motivated in studying and working with data will move forward. 
  • Study for yourself, not for others. It’s not a university where you can get rid of knowledge once you’re tested. Every piece of info that you get during your studies will serve as the foundation of your career. You need all the knowledge you can get. 
  • Ask questions. If something is unclear, ask a mentor. Questions should be your key tool of communication over the course of studying. It’s better to ask a question and fill a knowledge gap, than copy someone’s answer and end up ignorant. Remember, that an interview is one-to-one experience. 
  • Get rid of prejudice. Mentors are not university teachers but your future colleagues. They are eager to have a chat with you and spend some free time together. Don’t be afraid of them. If you face any difficulties and feel exhausted, unsure, and frustrated, share your feelings with the mentors and you’ll easily get help.