At the beginning of each project, you should know your desired outcomes. That is a universal rule regardless of your project type - whether it is a software application or a new steering wheel for a car. This piece presents some aspects of software requirements engineering (but something is universal enough). Especially standards that are available and you should know, but also suggested ways to follow and technologies that might help you when defining requirements.
One of the common challenges when dealing with data is how to send the dataset from one place to another and how to store data effectively. These problems are interconnected, as they both require serialization of data (sometimes called data encoding). Generally, the most effective data serialization method is to transform (or rather keep) data structures in binary form (opposite to text formats like JSON or XML). As usual, there are many established ways and standards for dealing with this issue.
It is more than ordinary that organizations that work with big data have two absolutely separated teams - one team responsible for processing data (data scientists) and another for writing and designing applications for end-users (that formally should use that data). It is equally common that technical problems in one universe are incomprehensible to people in another one. Not very surprisingly, the split leads to many problems.
This article presents some most popular use-cases and technologies related to NoSQL databases. These types of databases have become more and more popular. Furthermore, each presented technology provides some advantages for the specific use case (for example, graph databases can help effectively select objects based on relations constraints). Therefore, it is becoming essential to know these technologies and use them.
Multitasking and multiprocessing are the two main components of parallel programming. There is native support for these features in Python (CPython). But there are some limits specific to Python that have to be considered. The main problem is called GIL (Global Interpreter Lock) - which significantly reduces the number of use cases for applications in Python (mainly for creating asynchronous pipelines).
Cache represents a way how you can make your algorithm work faster. Basically, it is a memory that stores outputs of your algorithm for specific inputs - which outputs are cached depending on the policy. When you ask your algorithm for results, it first checks if it is stored in a cache (memory) to return without performing any computations. If this logic succeeds (information is in memory), the situation is called a cache hit. On the other hand, if the information is not in memory, it is called a cache miss.
Many helpful tools available in Python are essential for effective development - some of them are less known, others are way more popular. This article presents class methods and static methods (constructs of Python), conversion from class to dictionary (and vice versa) and metaclasses and their applications. All these things are vital for beginners and mid-senior Python engineers.