<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Stories of a Polyglot]]></title><description><![CDATA[Everyone has stories to tell. 'Stories of a Polyglot' is a pensive of my experiences of dealing with different technologies and my learnings from them.]]></description><link>https://blog.pratikms.com</link><generator>RSS for Node</generator><lastBuildDate>Tue, 21 Apr 2026 09:33:53 GMT</lastBuildDate><atom:link href="https://blog.pratikms.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Demystifying Connection Pools: A Deep Dive]]></title><description><![CDATA[Connection pools are a critical aspect of software engineering that allows applications to efficiently manage connections to a database or any other system. If your application requires constant access to a system, establishing a new connection to th...]]></description><link>https://blog.pratikms.com/demystifying-connection-pools-a-deep-dive</link><guid isPermaLink="true">https://blog.pratikms.com/demystifying-connection-pools-a-deep-dive</guid><category><![CDATA[Databases]]></category><category><![CDATA[Programming Blogs]]></category><category><![CDATA[Go Language]]></category><category><![CDATA[Software Engineering]]></category><category><![CDATA[connection pooling]]></category><dc:creator><![CDATA[Pratik Shivaraikar]]></dc:creator><pubDate>Tue, 25 Apr 2023 01:30:42 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1682450212600/8d6987f2-2e28-49c2-8d41-9b1f18dd6ce8.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Connection pools are a critical aspect of software engineering that allows applications to efficiently manage connections to a database or any other system. If your application requires constant access to a system, establishing a new connection to the system for every request can quickly become resource-intensive, causing your application to slow down or even crash. This is where connection pools come in.</p>
<p>As engineers, we often don't spend a lot of time thinking about connections. A single connection is typically inexpensive, but as things scale up, the cost of creating and maintaining these connections increase accordingly. This is why I believe understanding the world of connection pooling is important as it will enable us to build more performant and reliable applications, especially at scale.</p>
<h2 id="heading-typical-connections">Typical connections</h2>
<p>Before jumping to connection pooling, let us understand how an application typically connects to a system to perform any operation:</p>
<ol>
<li><p>The application attempts to open a connection.</p>
</li>
<li><p>A network socket is opened to connect the application to the system.</p>
</li>
<li><p>Authentication is performed.</p>
</li>
<li><p>Operation is performed.</p>
</li>
<li><p>Connection is closed.</p>
</li>
<li><p>Socket is closed.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1682451577199/522e67b7-a91f-4249-b5ec-21f0ac152323.png" alt class="image--center mx-auto" /></p>
<p>As you can see, opening and closing the connection and the network socket is a multi-step process that requires resource computation. However, not closing the connection, or keeping it idle also consumes resources. This is why we need connection pooling. While you will mostly see connection pooling being used in database systems, the concept can be extended to any application which communicates with a remote system over a network.</p>
<h2 id="heading-what-are-connection-pools">What are Connection Pools?</h2>
<p>Connection pools are typically a cache of connections that can be reused by an application. Instead of creating a new connection each time an application needs to interact with the system, a connection is borrowed from the pool, and when it's no longer needed, it is returned to the pool to be reused later. This approach ensures that the application always has access to a ready-to-use connection, without the need to create new connections continuously.</p>
<p>Connection pooling reduces the cost of opening and closing connections by maintaining a “pool” of open connections that can be passed from one operation to another as needed. This way, we are spared the expense of having to open and close a brand new connection for each operation the system is asked to perform.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1682452666071/3d11452b-35f1-47ff-a1d9-8af11f240533.png" alt class="image--center mx-auto" /></p>
<p>In this blog post, we'll demystify connection pools, explain how they work, how to implement them, and explore some of the common issues associated with connection pools. We'll also discuss connection pooling in the cloud and why it's important for modern-day applications. By the end of this blog post, you should have a good understanding of connection pools and how they can help you build more efficient and robust applications.</p>
<h2 id="heading-how-connection-pools-work">How Connection Pools Work</h2>
<p>The basic principle of connection pooling is to maintain a pool of connections that are ready for use, rather than creating and destroying connections as required. When a client requests a connection from the pool, the connection pool manager checks if there are any available connections in the pool. If an available connection exists, the connection pool manager returns the connection to the client. Otherwise, the connection pool manager creates a new connection, adds it to the pool, and returns the new connection to the client.</p>
<p>Connection pooling algorithms are used to manage the pool of connections. These algorithms determine when to create new connections and when to reuse existing connections. The most common algorithms used for connection pooling are LRU (Least Recently Used), and round-robin or FIFO (First In, First Out).</p>
<p>In LRU, the connection pool manager keeps track of the time that each connection was last used. When a new connection is required, the connection pool manager selects the least recently used connection from the pool and returns it to the user.</p>
<p>In FIFO, the connection pool manager manages connections in the order they were added to the pool. When a new connection is required, the connection pool manager selects the connection that has been in the pool the longest and returns it to the user.</p>
<p>Connection pooling configurations are used to set the parameters for the connection pool. These configurations include settings such as the minimum and maximum number of connections in the pool, the maximum time a connection can be idle before it is closed, and the maximum time a connection can be used before it is returned to the pool.</p>
<p>Overall, the basic principles of connection pooling involve creating a pool of database connections, managing the pool using algorithms and configurations, and reusing the connections as required to reduce overhead and improve performance.</p>
<h2 id="heading-implementing-our-own-connection-pool">Implementing Our Own Connection Pool</h2>
<p>To implement connection pooling in a specific programming language or framework, developers typically use connection pool libraries or built-in connection pool features. Code snippets and examples for implementing connection pools are often available in library documentation or online resources.</p>
<p>However, simply integrating an existing library in some dummy application is no good for us. Additionally, as a software engineer, implementing our own connection pool can bring a wealth of knowledge benefits. Firstly, it can significantly improve the performance of our application by reducing the overhead associated with establishing new connections. Additionally, it can help to prevent connection leaks and other issues that can arise from improperly managed connections.</p>
<p>Moreover, it provides us with fine-grained control over connection creation, usage, and destruction, allowing us to optimize our application's resource utilization. By implementing our own connection pooling, we can gain a deeper understanding of how our application works and thereby improve its scalability and reliability.</p>
<h3 id="heading-building-blocks">Building Blocks</h3>
<p>For ease of demonstration, we can use SQLite3 DB and implement our own custom pooling for the same. I'll be using Go language here because of its simplicity. You can use any language of your choice.</p>
<p>To start with, our <code>ConnectionPool</code> struct will look something like this:</p>
<pre><code class="lang-go"><span class="hljs-keyword">type</span> ConnectionPool <span class="hljs-keyword">struct</span> {
    queue       <span class="hljs-keyword">chan</span> *sql.DB
    maxSize     <span class="hljs-keyword">int</span>
    currentSize <span class="hljs-keyword">int</span>
    lock        sync.Mutex
    isNotFull   *sync.Cond
    isNotEmpty  *sync.Cond
}
</code></pre>
<p>Here, the <code>ConnectionPool</code> struct contains the <code>queue</code>, <code>maxSize</code>, <code>currentSize</code>, <code>lock</code>, <code>isNotFull</code>, and <code>isNotEmpty</code> fields. The <code>queue</code> field is a channel that holds pointers to <code>sql.DB</code> connections. <code>sql.DB</code> type belongs to Go's built in <code>database/sql</code> package. The <a target="_blank" href="https://pkg.go.dev/database/sql"><code>database/sql</code></a> provides a generic interface around SQL or SQL-like databases. This interface is implemented by the <a target="_blank" href="https://pkg.go.dev/github.com/mattn/go-sqlite3"><code>github.com/mattn/go-sqlite3</code></a> package which we will be using as an SQLite3 driver.</p>
<p>The <code>maxSize</code> field represents the maximum number of connections that the pool can have, and the <code>currentSize</code> field represents the current number of connections in the pool. The <code>lock</code> field is a <a target="_blank" href="https://en.wikipedia.org/wiki/Lock_(computer_science)">mutex</a> that ensures that concurrent access to shared memory is synchronized. The <code>isNotFull</code> and <code>isNotEmpty</code> fields are condition variables that allow for efficient waiting and are used to signal when the pool is not full and not empty, respectively.</p>
<p><code>sync.Cond</code> is a synchronization primitive in Go that allows multiple goroutines to wait for a shared condition to be satisfied. It is often used in conjunction with a mutex, which provides exclusive access to a shared resource (in this case the <code>queue</code>), to coordinate the execution of multiple goroutines.</p>
<p>Yes. Channels can also be used for synchronization, but they come with some overhead in terms of memory usage and complexity. In this case, the use of <code>sync.Cond</code> provides a simpler and more lightweight alternative as they allow for efficient signaling for waiting goroutines.</p>
<p>By using <code>sync.Cond</code>, the implementation can ensure that goroutines waiting on the condition will be woken up only when the condition is actually met, rather than relying on a buffer channel that might have stale data. This improves the overall performance and reduces the likelihood of race conditions or deadlocks.</p>
<h3 id="heading-getting-connection-object-from-the-pool">Getting Connection Object from the Pool</h3>
<p>Next, we will implement a <code>Get</code> method which will return a database object from an existing <code>ConnectionPool</code>:</p>
<pre><code class="lang-go"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(cp *ConnectionPool)</span> <span class="hljs-title">Get</span><span class="hljs-params">()</span> <span class="hljs-params">(*sql.DB, error)</span></span> {
    cp.lock.Lock()
    <span class="hljs-keyword">defer</span> cp.lock.Unlock()

    <span class="hljs-comment">// If queue is empty, wait</span>
    <span class="hljs-keyword">for</span> cp.currentSize == <span class="hljs-number">0</span> {
        fmt.Println(<span class="hljs-string">"Waiting for connection to be added back in the pool"</span>)
        cp.isNotEmpty.Wait()
    }

    fmt.Println(<span class="hljs-string">"Got connection!! Releasing"</span>)
    db := &lt;-cp.queue
    cp.currentSize--
    cp.isNotFull.Signal()

    err := db.Ping()
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
    }

    <span class="hljs-keyword">return</span> db, <span class="hljs-literal">nil</span>
}
</code></pre>
<p>This function, <code>Get()</code>, retrieves a connection from the pool. First, it acquires the lock to ensure exclusive access to the shared state of the connection pool. If the pool is currently empty, the function waits until a connection is added back to the pool.</p>
<p>Once a connection is available, the function dequeues it from the <code>queue</code>, decrements <code>currentSize</code>, and signals that the pool is not full. It then checks whether the connection is still valid by calling <code>Ping()</code>. If the connection is not valid, an error is returned, and the connection is not returned to the caller. If the connection is valid, it is returned to the caller.</p>
<h3 id="heading-adding-connection-object-to-the-pool">Adding Connection Object to the Pool</h3>
<p>Moving on, we add an <code>Add</code> method whose responsibility will be to add the connection object to the pool once it has been used:</p>
<pre><code class="lang-go"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(cp *ConnectionPool)</span> <span class="hljs-title">Add</span><span class="hljs-params">(db *sql.DB)</span> <span class="hljs-title">error</span></span> {
    <span class="hljs-keyword">if</span> db == <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> errors.New(<span class="hljs-string">"database not yet initiated. Please create a new connection pool"</span>)
    }

    cp.lock.Lock()
    <span class="hljs-keyword">defer</span> cp.lock.Unlock()

    <span class="hljs-keyword">for</span> cp.currentSize == cp.maxSize {
        fmt.Println(<span class="hljs-string">"Waiting for connection to be released"</span>)
        cp.isNotFull.Wait()
    }

    cp.queue &lt;- db
    cp.currentSize++
    cp.isNotEmpty.Signal()

    <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>
}
</code></pre>
<p>This function, <code>Add()</code>, adds a connection to the pool. It first checks whether the connection is <code>nil</code> and returns an error if it is. Then, it acquires the lock to ensure exclusive access to the shared state of the connection pool. If the pool is currently full, the function waits until a connection is released from the pool.</p>
<p>Once there is space in the pool, the function enqueues the connection onto the <code>queue</code>, increments <code>currentSize</code>, and signals that the pool is not empty. The function returns <code>nil</code> to indicate success</p>
<h3 id="heading-closing-the-connection-pool">Closing the Connection Pool</h3>
<p>As the name suggests, we will implement a <code>Close</code> function which will be responsible for closing all database connections in the pool. It starts by acquiring a lock and then it iterates through the all connections in the pool and closes them one by one. After closing each connection, it decrements the <code>currentSize</code> counter and signals any waiting goroutines that there is space that is now available in the pool.</p>
<pre><code class="lang-go"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(cp *ConnectionPool)</span> <span class="hljs-title">Close</span><span class="hljs-params">()</span></span> {
    cp.lock.Lock()
    <span class="hljs-keyword">defer</span> cp.lock.Unlock()

    <span class="hljs-keyword">for</span> cp.currentSize &gt; <span class="hljs-number">0</span> {
        db := &lt;-cp.queue
        db.Close()
        cp.currentSize--
        cp.isNotFull.Signal()
    }

    <span class="hljs-built_in">close</span>(cp.queue)
}
</code></pre>
<h3 id="heading-initializing-the-connection-pool">Initializing the Connection Pool</h3>
<p>We will implement a <code>NewConnectionPool</code> function as a constructor for a new connection pool. It takes the <code>driver</code>, <code>dataSource</code>, and <code>maxSize</code> arguments and returns a pointer to a new <code>ConnectionPool</code> instance. It first checks if the provided <code>driver</code> and <code>dataSource</code> arguments are valid by opening a connection to the database. If the connection is successful, it initializes a new connection pool with the provided <code>maxSize</code> argument. It then creates a new channel of <code>*sql.DB</code> objects and pre-populates it with <code>maxSize</code> database connections by creating a new database connection for each iteration of a loop. Finally, it returns the new <code>ConnectionPool</code> instance.</p>
<pre><code class="lang-go"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">NewConnectionPool</span><span class="hljs-params">(driver, dataSource <span class="hljs-keyword">string</span>, maxSize <span class="hljs-keyword">int</span>)</span> <span class="hljs-params">(*ConnectionPool, error)</span></span> {

    <span class="hljs-comment">// Validate driver and data source</span>
    _, err := sql.Open(driver, dataSource)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
    }

    cp := &amp;ConnectionPool{
        queue:       <span class="hljs-built_in">make</span>(<span class="hljs-keyword">chan</span> *sql.DB, maxSize),
        maxSize:     maxSize,
        currentSize: <span class="hljs-number">0</span>,
    }

    cp.isNotEmpty = sync.NewCond(&amp;cp.lock)
    cp.isNotFull = sync.NewCond(&amp;cp.lock)

    <span class="hljs-keyword">for</span> i := <span class="hljs-number">0</span>; i &lt; maxSize; i++ {
        conn, err := sql.Open(driver, dataSource)
        <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
            <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
        }
        cp.queue &lt;- conn
        cp.currentSize++
    }

    <span class="hljs-keyword">return</span> cp, <span class="hljs-literal">nil</span>
}
</code></pre>
<h3 id="heading-putting-it-all-together">Putting it All Together</h3>
<p>This is what our final custom Connection Pool implementation looks like:</p>
<pre><code class="lang-go"><span class="hljs-keyword">package</span> pool

<span class="hljs-keyword">import</span> (
    <span class="hljs-string">"database/sql"</span>
    <span class="hljs-string">"errors"</span>
    <span class="hljs-string">"fmt"</span>
    <span class="hljs-string">"sync"</span>

    _ <span class="hljs-string">"github.com/mattn/go-sqlite3"</span>
)

<span class="hljs-keyword">type</span> ConnectionPool <span class="hljs-keyword">struct</span> {
    queue       <span class="hljs-keyword">chan</span> *sql.DB
    maxSize     <span class="hljs-keyword">int</span>
    currentSize <span class="hljs-keyword">int</span>
    lock        sync.Mutex
    isNotFull   *sync.Cond
    isNotEmpty  *sync.Cond
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(cp *ConnectionPool)</span> <span class="hljs-title">Get</span><span class="hljs-params">()</span> <span class="hljs-params">(*sql.DB, error)</span></span> {
    cp.lock.Lock()
    <span class="hljs-keyword">defer</span> cp.lock.Unlock()

    <span class="hljs-comment">// If queue is empty, wait</span>
    <span class="hljs-keyword">for</span> cp.currentSize == <span class="hljs-number">0</span> {
        fmt.Println(<span class="hljs-string">"Waiting for connection to be added back in the pool"</span>)
        cp.isNotEmpty.Wait()
    }

    fmt.Println(<span class="hljs-string">"Got connection!! Releasing"</span>)
    db := &lt;-cp.queue
    cp.currentSize--
    cp.isNotFull.Signal()

    err := db.Ping()
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
    }

    <span class="hljs-keyword">return</span> db, <span class="hljs-literal">nil</span>
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(cp *ConnectionPool)</span> <span class="hljs-title">Add</span><span class="hljs-params">(db *sql.DB)</span> <span class="hljs-title">error</span></span> {
    <span class="hljs-keyword">if</span> db == <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> errors.New(<span class="hljs-string">"database not yet initiated. Please create a new connection pool"</span>)
    }

    cp.lock.Lock()
    <span class="hljs-keyword">defer</span> cp.lock.Unlock()

    <span class="hljs-keyword">for</span> cp.currentSize == cp.maxSize {
        fmt.Println(<span class="hljs-string">"Waiting for connection to be released"</span>)
        cp.isNotFull.Wait()
    }

    cp.queue &lt;- db
    cp.currentSize++
    cp.isNotEmpty.Signal()

    <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(cp *ConnectionPool)</span> <span class="hljs-title">Close</span><span class="hljs-params">()</span></span> {
    cp.lock.Lock()
    <span class="hljs-keyword">defer</span> cp.lock.Unlock()

    <span class="hljs-keyword">for</span> cp.currentSize &gt; <span class="hljs-number">0</span> {
        db := &lt;-cp.queue
        db.Close()
        cp.currentSize--
        cp.isNotFull.Signal()
    }

    <span class="hljs-built_in">close</span>(cp.queue)
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">NewConnectionPool</span><span class="hljs-params">(driver, dataSource <span class="hljs-keyword">string</span>, maxSize <span class="hljs-keyword">int</span>)</span> <span class="hljs-params">(*ConnectionPool, error)</span></span> {

    <span class="hljs-comment">// Validate driver and data source</span>
    _, err := sql.Open(driver, dataSource)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
    }

    cp := &amp;ConnectionPool{
        queue:       <span class="hljs-built_in">make</span>(<span class="hljs-keyword">chan</span> *sql.DB, maxSize),
        maxSize:     maxSize,
        currentSize: <span class="hljs-number">0</span>,
    }

    cp.isNotEmpty = sync.NewCond(&amp;cp.lock)
    cp.isNotFull = sync.NewCond(&amp;cp.lock)

    <span class="hljs-keyword">for</span> i := <span class="hljs-number">0</span>; i &lt; maxSize; i++ {
        conn, err := sql.Open(driver, dataSource)
        <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
            <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
        }
        cp.queue &lt;- conn
        cp.currentSize++
    }

    <span class="hljs-keyword">return</span> cp, <span class="hljs-literal">nil</span>
}
</code></pre>
<p>Of course, there are many ways in which this implementation can be improved upon. Typically, you can use any variation of the Bounded-queue to implement your own connection pool. Most connection pool implementation use <a target="_blank" href="https://www.oreilly.com/library/view/design-patterns-and/9781786463593/2ff33f7c-aab8-4a4d-bacc-c475c3d1c928.xhtml">Bounded-queue</a> as the underlying data structure.</p>
<p>The complete implementation along with its usage is open-sourced <a target="_blank" href="https://github.com/pratikms/dbconnectionpool">here</a> in case you wish to play around. I'll suggest running it in debug mode to watch the signaling magic of <code>sync.Cond</code> unfold.</p>
<h2 id="heading-common-connection-pooling-issues">Common Connection Pooling Issues</h2>
<p>While connection pooling can bring many benefits to an application, it is not without its challenges. Here are some common issues that can arise with connection pooling:</p>
<ul>
<li><p><strong>Overuse of Connection Pools</strong>: Connection pools should be used judiciously, as an overuse of pools can result in a decrease in application performance. This is because the connection pool itself can become a bottleneck if too many connections are being opened and closed, causing delays in database transactions.</p>
</li>
<li><p><strong>Pool Size Configuration Errors</strong>: Connection pool size is an important consideration when implementing connection pooling. If the pool size is too small, there may not be enough connections available to handle peak traffic, resulting in errors or delays. On the other hand, if the pool size is too large, it can lead to unnecessary resource consumption and potential performance issues.</p>
</li>
<li><p><strong>Connection Leaks</strong>: Connection leaks occur when a connection is not properly closed and returned to the pool after it has been used. This can lead to resource exhaustion, as unused connections will remain open and tie up valuable resources. Over time, this can result in degraded application performance and, in extreme cases, cause the application to crash.</p>
</li>
</ul>
<p>To avoid these issues, it is important to monitor connection pool usage and performance regularly. Best practices such as setting appropriate pool size, tuning timeout and idle settings, and configuring automatic leak detection and recovery can help minimize the impact of these issues. Additionally, logging and alerting mechanisms can be put in place to help identify and remediate any issues that do occur.</p>
<h2 id="heading-connection-pooling-in-cloud-environments">Connection Pooling in Cloud Environments</h2>
<p>Connection pooling is an important consideration when designing applications for the cloud. Cloud environments offer several unique challenges, such as elastic scalability and dynamic resource allocation. Connection pooling can help address some of these challenges, but there are additional considerations to take into account.</p>
<p>In a cloud environment, applications may be running on multiple instances or virtual machines. This means that a single connection pool may not be sufficient to handle the load from all of these instances. Instead, it may be necessary to implement multiple connection pools, each handling a subset of the total workload.</p>
<p>Another consideration is the dynamic nature of cloud environments. Instances can be added or removed from the environment at any time, which means that the size of the connection pool may need to be adjusted accordingly. This can be achieved through automation tools or by implementing dynamic scaling rules based on metrics such as CPU usage or network traffic.</p>
<p>Security is also an important consideration when implementing connection pooling in the cloud. In a shared environment, it is important to ensure that connections are secure and cannot be accessed by unauthorized parties. This may involve implementing encryption or access control measures, such as IP filtering.</p>
<p>Finally, it is important to ensure that connection pooling is properly configured for the specific cloud environment being used. Each cloud provider may have its own specific requirements and recommendations for connection pooling, such as maximum pool size or connection timeouts. It is important to consult the provider's documentation and best practices guides to ensure that connection pooling is properly configured for optimal performance and reliability.</p>
<p>In summary, connection pooling can be a valuable tool for optimizing performance and managing resources in cloud environments. However, there are additional considerations that must be taken into account to ensure that connection pooling is properly implemented and configured for the specific cloud environment being used.</p>
<h2 id="heading-final-thoughts">Final Thoughts</h2>
<p>In conclusion, connection pooling is a crucial concept in modern software development that can help to improve application performance and scalability while reducing resource usage. By caching and reusing database connections, connection pooling can reduce the overhead of creating and destroying connections, leading to faster application response times and increased throughput.</p>
<p>However, connection pooling is not a silver bullet and must be used carefully and thoughtfully. Common issues such as overuse of connection pools, pool size configuration errors, and connection leaks can cause performance degradation and even application crashes.</p>
<p>When using connection pooling in cloud environments, additional considerations must be taken into account, such as the network latency between the application and the database, and the dynamic nature of cloud resources.</p>
<p>To sum up, connection pooling is an important tool for improving database performance in modern software applications. By understanding how connection pooling works, common issues to look out for, and best practices for implementation, software engineers can harness the power of connection pooling to build more performant, scalable, and reliable applications.</p>
]]></content:encoded></item><item><title><![CDATA[Revolutionizing Data Security by Design]]></title><description><![CDATA[For decades, we have benefited from modern cryptography to protect our sensitive data during transmission and storage. However, we have never been able to keep the data protected while it is being processed.
Nearly 4 billion data records were stolen ...]]></description><link>https://blog.pratikms.com/revolutionizing-data-security-by-design</link><guid isPermaLink="true">https://blog.pratikms.com/revolutionizing-data-security-by-design</guid><category><![CDATA[encryption]]></category><category><![CDATA[Security]]></category><dc:creator><![CDATA[Pratik Shivaraikar]]></dc:creator><pubDate>Sun, 16 Aug 2020 16:01:39 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1597593655494/9GlKuYRHN.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>For decades, we have benefited from modern cryptography to protect our sensitive data during transmission and storage. However, we have never been able to keep the data protected while it is being processed.</p>
<p>Nearly 4 billion data records were stolen in 2016. Each one cost the record holder nearly $158. If we do the simple math, in 2016 alone, attackers amassed a whopping $632 billion. The very scale, sophistication, and cost of cyber-attacks escalate every year. Cyber-attacks will continue this exploitation and today’s technologies will not be able to keep pace. In such times, we need an encryption technology to disorient and discourage bad actors.</p>
<p>For example, many years from now, a fault-tolerant, universal quantum computer with millions of qubits could quickly sift through the probabilities and decrypt even the strongest common encryption, rendering this foundational security methodology, that we know as of today, obsolete.</p>
<p>This is where <strong>Homomorphic Encryption</strong> comes in. Homomorphic encryption helps us in solving a lot of problems that today's  <a target="_blank" href="https://en.wikipedia.org/wiki/Elliptic-curve_cryptography">elliptic-curve cryptography (ECC)</a> algorithms fail to address in our cloud infrastructure security.</p>
<h2 id="shortcomings-of-todays-encryption-techniques">Shortcomings of today's encryption techniques</h2>
<p>When it comes to cloud security, our data is basically encrypted in two states: during transit and on storage.</p>
<p>In transit, the encryption techniques that we use today suffer from a problem called as TLS / SSL termination. Interestingly, this problem that we're talking about, is also very proudly marketed as a feature by reverse proxies such as <a target="_blank" href="https://docs.nginx.com/nginx/admin-guide/security-controls/terminating-ssl-http/">Nginx</a>,  <a target="_blank" href="https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/security/ssl#:~:text=Envoy%20supports%20both%20TLS%20termination,have%20advanced%20TLS%20requirements%20(TLS1.">Envoy</a>, etc.</p>
<p>TLS termination is basically used by reverse proxies for handling incoming connections and decrypting the TLS to pass on the unencrypted request to the appropriate servers. This is exactly the infrastructure limitation that attackers take advantage of. The whole threat model revolves around exploiting the fact of the availability of unencrypted data past this TLS termination phase.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1597583622719/j5Mow3Lc3.jpeg" alt="tls-termination.jpg" /></p>
<p>In the case of storage, there are two ways in which we do things today. We either store the data in our databases mostly unencrypted in plain text; or in some cases, by doing some form of encryption. In the case of cloud providers like GCP, AWS, Azure, etc., this encryption is done using some Key Management Service (KMS). Even in this case, while the data may be stored encrypted, there always comes a time where the application needs to decrypt the data if it wants to perform any operation on it. </p>
<p>Every service that we know, as of today, runs on unencrypted data. The trends that Twitter shows, cannot be obtained by operating on encrypted data. The recommendations system on YouTube, the news feed on Facebook, all the predictions of every application that we see out there operate on unencrypted data.</p>
<p>It is these very shortfalls that Homomorphic encryption aims to address.</p>
<h2 id="homomorphic-encryption">Homomorphic encryption</h2>
<blockquote>
<p>Imagine if you could compute on encrypted data without ever decrypting it.
What would you do?</p>
<p>― Flavio Bergamaschi</p>
</blockquote>
<p><a target="_blank" href="https://en.wikipedia.org/wiki/Lattice-based_cryptography">Lattice-based cryptography</a> proves it's superiority as it uses very difficult math problems to hide data. By the time computers are strong enough to crack today's encryption, the world can be prepared with lattice cryptography. Lattice cryptography, as of this day, to the best of our knowledge, is quantum resistant. It means that there does not exist any quantum algorithm that can decrypt this type of cryptography. Lattice cryptography is also the basis of  <a target="_blank" href="https://en.wikipedia.org/wiki/Homomorphic_encryption#Fully_Homomorphic_Encryption">Homomorphic Encryption (FHE)</a>.</p>
<p>Homomorphic encryption is the ability to perform arithmetic operations on encrypted data. None of our existing encryption techniques allow us to do that. Because of this ability, we really don't need to decrypt our data, ever! It does, quite conveniently, address the shortcomings of our existing encryption techniques. In transit, the TLS termination problem never occurs as the reverse proxy need not decrypt the data. It can perform all its operations on the encrypted data itself and make all the necessary decisions without ever terminating the TLS. Even in a persistent store, all database queries can very well be performed on encrypted data.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1597587658631/RcF0m9Gih.jpeg" alt="fhe.jpg" /></p>
<p>Fully Homomorphic Encryption (FHE) protects us from these honest-but-curious threat models. An honest-but-curious (HBC) adversary is a legitimate participant in a communication protocol who will not deviate from the defined protocol but will attempt to learn all possible information from legitimately received messages. To get an idea of what this means, an effective comparison can help us great bounds</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1597591541754/6On13kfJ9.jpeg" alt="todays-threat-model.jpg" /></p>
<p>With the way that we do things today, the usual consensus is that Alice encrypts some data and sends it as an input to Bob. Bob can decrypt that data, process, and store it at his end. Just like Alice, even Bob can encrypt some data and send it over to Alice where she can decrypt and process it at her end. Such a mechanism protects us against  <a target="_blank" href="https://en.wikipedia.org/wiki/Man-in-the-middle_attack#:~:text=In%20cryptography%20and%20computer%20security,directly%20communicating%20with%20each%20other.">man-in-the-middle (MITM) attacks</a>. Which is why Eve can't eavesdrop on any communication between Alice and Bob. But Bob, on the other hand, has access to all this unencrypted data. Here, Bob is the honest-but-curious actor.</p>
<p>For the sake of convenience, we are assuming Bob to just be an honest-but-curious actor in this case without any malicious intents. For the threat models involving Bob, sitting inside our cloud infrastructure, having malicious intents, and free access to all this unencrypted data, there are other protocols that we can use in combination with homomorphic encryption to counter such scenarios. But at this moment, for the sake of convenience, we will just be assuming Bob to be an honest-but-curious actor with non-malicious intent.</p>
<p>Interestingly, in the case of Homomorphic encryption, along with protection against eavesdropping and MITM, we get the added protection of not allowing Bob to sit on a gold mine of unencrypted data by encrypting everything that gets stored. This, however, does not steal away Bob's ability to perform operations on the data as he used to. One of the very benefits of homomorphic encryption is that unlike all the encryption techniques that we've seen till now, we need not decrypt the data. We can perform all the operations on the encrypted data itself.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1597591933675/vo5miUiCS.jpeg" alt="fhe-threat-model.jpg" /></p>
<h3 id="applications-of-homomorphic-encryption">Applications of Homomorphic encryption</h3>
<p>Right off the bat, some of the use-cases that we can consider for such an encryption technique are:</p>
<ul>
<li><strong>Oblivious queries.</strong> Allowing searching without intent. For example, today, while requesting weather info, we need to reveal our location to cloud providers. In case of homomorphic encryption, since our location too, will always be encrypted, we need not reveal a lot of our data</li>
<li><strong>Set intersections.</strong> Today, in order to determine an overlap, we need to completely share both the sets. Using homomorphic encryption, we can determine the overlaps without disclosure of the entire sets.</li>
<li><strong>Extracting value from private data.</strong> We can now use all the machine learning models like traditional, regression or neural network models, etc. to perform the computation of all of our private data</li>
<li><strong>Secure outsourcing.</strong> Even today, there still exist quite a few enterprises that maintain on-prem infrastructure due to lack of trust with the cloud providers. Homomorphic encryption, because of its data privacy features by design, can encourage wider cloud adoption.</li>
</ul>
<h2 id="proof-of-concept">Proof of Concept</h2>
<p>Without making this article sound like an ad, let us get our hands dirty and watch how Homomorphic Encryption can be actually implemented. Microsoft has a  <a target="_blank" href="https://github.com/microsoft/SEAL">SEAL library</a> which supports homomorphic encryption. IBM too recently released a <a target="_blank" href="https://github.com/IBM/fhe-toolkit-linux">Fully Homomorphic Encryption toolkit for Linux</a>. For the sake of simplicity, since IBM's FHE toolkit is based on Docker container, we will be using it for our POC</p>
<p>First, we need to clone the  <a target="_blank" href="https://github.com/IBM/fhe-toolkit-linux">repo</a>:</p>
<pre><code>$ git <span class="hljs-keyword">clone</span> https:<span class="hljs-comment">//github.com/IBM/fhe-toolkit-linux.git</span>
</code></pre><p>Once cloned, we need to run the <code>FetchDockerImage.sh</code> shell script. We also need to provide container OS as an argument to the shell script. For simplicity, we will be using Ubuntu:</p>
<pre><code>$ cd fhe-toolkit-linux
$ ./FetchDockerImage.sh ubuntu
</code></pre><p>The download and setup of the toolkit will take some time depending on the bandwidth speed and hardware.</p>
<p>Next, we need to run the IBMCOM pre-built toolkit from Docker Hub:</p>
<pre><code>$ ./RunToolkit.sh -p ubuntu
</code></pre><p>The output of the above command should be something similar to:</p>
<pre><code>$ ./RunToolkit.sh -p ubuntu
WARNING: No swap limit support
INFO:    Using system <span class="hljs-keyword">default</span> persistent storage path...
INFO:    Persistent <span class="hljs-keyword">data</span> storage: <span class="hljs-string">"/home/pratik/Projects/fhe/fhe-toolkit-linux/FHE-Toolkit-Workspace"</span>
INFO:    CMake: Deleting cached built settings and reconfigure
INFO:    Launching FHE tookit: 


         docker run -d --name fhe-toolkit-ubuntu  -v /home/pratik/Projects/fhe/fhe-toolkit-linux/FHE-Toolkit-Workspace:/opt/IBM/FHE-Workspace  -p <span class="hljs-number">8443</span>:<span class="hljs-number">8443</span> ibmcom/fhe-toolkit-ubuntu


<span class="hljs-number">8f</span>dcd97b1d203f0e71e4602ce6d24a76cd768c5fc2f8c5ee6b99ed7acb1a7886

CONTAINER ID        IMAGE                       COMMAND                  CREATED             STATUS                  PORTS                    NAMES
<span class="hljs-number">8f</span>dcd97b1d20        ibmcom/fhe-toolkit-ubuntu   <span class="hljs-string">"code-server --bind-…"</span>   <span class="hljs-number">6</span> seconds ago       Up Less than a second   <span class="hljs-number">0.0</span>.0.0:<span class="hljs-number">8443</span>-&gt;<span class="hljs-number">8443</span>/tcp   fhe-toolkit-ubuntu

FHE Development <span class="hljs-keyword">is</span> <span class="hljs-keyword">open</span> <span class="hljs-keyword">for</span> business: https:<span class="hljs-comment">//127.0.0.1:8443/</span>
</code></pre><p>We now have a web server running at https://127.0.0.1:8443/. All our next operations will be through the browser.</p>
<p>On opening the browser and accepting the prompt because of the self-signed certificate, it will open VS code interface in the browser. Soon, it will ask us to select a kit, make sure to select the option which says <em>GCC for x86_64-linux-gnu 9.3.0</em></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1597578399743/QmrL-X0CJ.png" alt="Select kit.png" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1597578438470/bmaGRD4bi.png" alt="configure project.png" /></p>
<p>Next, click <em>Build</em> in the CMake Tools status bar to build the selected target.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1597578627732/76_dmFwo4.png" alt="build.png" /></p>
<p>If you look into the <code>examples/BGV_country_db_lookup</code> directory, you can find the <code>countries_dataset.csv</code> file. It is a list of countries and their capital cities from the continent of Europe. When we will be running the toolkit, it will be using the <code>BGV_country_db_lookup.cpp</code> file to encrypt the contents of CSV. It also contains code that allows us to search on encrypted data. On providing the country name as input, the script will look up through the encrypted list of countries and output it's matching capital.</p>
<p>Let's proceed to run the toolkit:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1597579470004/4I2Pqt-52.png" alt="run.png" /></p>
<p>On following the text instructions, if we go ahead and enter any country, it goes through the databases and outputs the capital of the same</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1597580313555/gtrC3-4nO.png" alt="search.png" /></p>
<h2 id="final-thoughts">Final thoughts</h2>
<p>Though Homomorphic Encryption is a great and an extremely promising technology, is it ready for out-of-the-box use? Absolutely not. This is very much evident from the POC that we did. For searching an encrypted database with around 47 entries, it took almost 2-3 minutes. There is no denying that this is an awesome start and definitely in the right direction, but we still have a long way to go. Having said that, Homomorphic Encryption can very well be the next big breakthrough in the computer science industry. We can only imagine the endless possibilities when the first FHE-enabled database would be implemented. Or the first FHE-supported proxy. Nonetheless, we're surely in for some exciting times ahead!</p>
<p></p><hr />
<em><a target="_blank" href="https://r.daily.dev/get?r=devto">daily.dev</a> delivers the best programming news every new tab. We will rank hundreds of qualified sources for you so that you can hack the future.</em>
<a target="_blank" href="https://r.daily.dev/get?r=devto"><img src="https://dev-to-uploads.s3.amazonaws.com/i/b996k4sm4efhietrzups.png" alt="Daily Poster" /></a><p></p>
]]></content:encoded></item><item><title><![CDATA[Evolution of Microservices]]></title><description><![CDATA[The central idea behind microservices is that some types of applications become easier to build and maintain when they are broken down into smaller, composable pieces which work together. Each component is continuously developed and separately mainta...]]></description><link>https://blog.pratikms.com/evolution-of-microservices</link><guid isPermaLink="true">https://blog.pratikms.com/evolution-of-microservices</guid><category><![CDATA[Microservices]]></category><category><![CDATA[architecture]]></category><category><![CDATA[serverless]]></category><category><![CDATA[software architecture]]></category><category><![CDATA[patterns]]></category><dc:creator><![CDATA[Pratik Shivaraikar]]></dc:creator><pubDate>Sun, 19 Jul 2020 15:51:38 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1595269543783/Qw-bLaawu.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The central idea behind microservices is that some types of applications become easier to build and maintain when they are broken down into smaller, composable pieces which work together. Each component is continuously developed and separately maintained, and the application is then simply the sum of its constituent components. This is in contrast to a traditional, <em>monolithic</em> application which is developed all in one piece.</p>
<p>Applications built as a set of modular components are easier to understand, easier to test, and most importantly easier to maintain over the life of the application. It enables organizations to achieve much higher agility and be able to vastly improve the time it takes to get working improvements to production. This approach has proven to be superior, especially for large enterprise applications which are developed by teams of geographically and culturally diverse developers.</p>
<p>There are some other benefits as well for a microservice architecture, which include:</p>
<ul>
<li><strong>Developer independence</strong>: Small teams work in parallel and can iterate faster than large teams.</li>
<li><strong>Isolation and resilience</strong>: If a component dies, you spin up another while and the rest of the application continues to function.</li>
<li><strong>Scalability</strong>: Smaller components take up fewer resources and can be scaled to meet increasing demand of that component only.</li>
<li><strong>Lifecycle automation</strong>: Individual components are easier to fit into continuous delivery pipelines and complex deployment scenarios not possible with monoliths.</li>
</ul>
<p>But how did we reach here? Believe it or not, but we&#39;ve come a long way in the past decade to design microservices the way that we do today. To understand why things are done the way they are in the microservice land, I believe it is important to understand the process of evolution of the microservice architecture.</p>
<h2 id="origins">Origins</h2>
<p>Traditional application design is often called <em>monolithic</em> because the whole thing is developed in one piece. Even if the logic of the application is modular, it is deployed as one group, like  for example a Go application, which, when built, gives us an executable file. We can imagine this as if all of notes of different subjects of a college student were compiled into one long stream.</p>
<p>This type of code writing and deploying is convenient because it all happens in one spot, but it incurs significant technical debt over time. That’s because successful applications have a tendency of getting bigger and more complex as the product grows, and that makes it harder and harder to run.</p>
<p>As these systems had a tight coupling process, any changes made to the code could potentially endanger the performance of the entire application. The functionalities were too interdependent for a new technological age that demanded constant innovations and adaptation.</p>
<p>Another issue with monolithic architecture was its inability to scale individual functionalities. One crucial aspect of successful businesses is their ability to keep up with consumer demands. Naturally, these demands depend on various factors and fluctuate over time.</p>
<p>At some point, the product will need to scale only a certain function of its service to respond to a growing number of requests. With monolithic apps, you weren’t able to scale individual elements but rather had to scale the application as a whole.</p>
<p>Enter microservices. However, the idea of separating applications into smaller parts is not new. There are other programming paradigms which address this same concept, such as Service Oriented Architecture (SOA). However, recent technology advances coupled with an increasing expectation of integrated <em>digital experiences</em> have given rise to a new breed of development tools and techniques used to meet the needs of modern business applications.</p>
<p>But this initial microservice / SOA architecture, which just simply took monoliths and broke them up into smaller units, had some problems of it&#39;s own. After being broken down to smaller units, these microservices needed to communicate among themselves to function. The first natural choice to facilitate this communication was, and in many cases, still remains, REST APIs.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1595159335166/8Zyvc0-nk.jpeg" alt="rest-apis.jpg"></p>
<p>This worked to a point. But then, synchronous request-response communication led to tight point-to-point coupling. This brought us all the way back to where we were. It became so tightly coupled that this problem was coined a term called as <em>distributed monoliths</em>. So basically you have microservices for the namesake,  but you still have all the problems of monoliths like having to co-ordinate with teams, dealing with big fat releases, and a lot of the fragility that comes along. Some of the problems of such a distributed monolith architecture are:</p>
<h3 id="clients-knowing-a-bit-too-much">Clients knowing a bit too much</h3>
<p>Initially, the clients — which could be a mobile app, a web app, or any client of that sort — used to get a big fat documentation containing information about the APIs to be integrated. This resulted in the client knowing a bit too much than what they were supposed to. This essentially resulted in a bottle-neck when it came to making changes to the microservices. Adding a new microservice now meant changes to be introduced to the client as well. Making changes to existing microservices also forced changes to be made in the client. </p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1595160554232/PxiH-yZ7g.jpeg" alt="clients-know-too-much.jpg"></p>
<h3 id="unavoidable-redundancies">Unavoidable redundancies</h3>
<p>When breaking monoliths into smaller units, It becomes tricky to decide who will be responsible for what function of the system. If the system&#39;s architecture fails to address these issues properly, it would often result in some unavoidable redundancies. For example, If a microservice sends a request to another microservice and it fails to respond, suddenly the questions like <em>what happens then</em> becomes of paramount importance. This meant that the microservice from where the request originated, had to now take responsibility of being able to do something intelligent. This applied to every other microservice in the system, and even before we knew it, it became a viscous cycle. In order to handle such cases, every team ended up solving a lot of common problems. Such problems of shared infrastructure once again lead to the same issues which we were facing in monoliths.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1595161238199/-N05-Io7Y.jpeg" alt="unavoidable-redundancies.jpg"></p>
<h3 id="making-changes-is-risky">Making changes is risky</h3>
<p>As a microservice may not always know about the other microservices that communicate with it since they only communicate with each other using RESTful APIs, it may become hard to determine which microservices may end up breaking if introduce some changes to our microservice. Even with good API contracts such as  <a target='_blank' rel='noopener noreferrer'  href="https://www.openapis.org/">OpenAPI</a>, it is not an easy job. A lot of validation is required for all the microservices that are involved.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1595161744214/Ks1HT8fFo.jpeg" alt="making-changes-not-easy.jpg"></p>
<h2 id="evolution">Evolution</h2>
<p>Now that we&#39;ve seen the challenges that we initially faced with microservice, or rather the distributed monoliths pattern that we used the first few years since the introduction of microservice as an architectural pattern, we can now have a better understanding of the problems that we aimed at solving, one-by-one, thereby evolving the microservice architecture in general.</p>
<h3 id="api-gateways">API Gateways</h3>
<p>Clients know a bit too much? Microservices end up having unavoidable redundancies in order to address common problems? Enter API Gateways. As simple as it sounds, but the introduction of a simple API gateway, really does end up solving a lot of problems. For starters, it frees all microservices from having to worry about authentication, encryption, routing etc. The client does not have to worry about changes that happen in the microservice land as it only communicates with the API gateway. This hugely simplifies things at the client as well as at the server side of things.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1595162360397/RwNvGV_l6.jpeg" alt="api-gateway.jpg"></p>
<p>Responsibilities of an API gateway: </p>
<ul>
<li><strong>Authentication</strong>: Microservices don&#39;t have to worry about the overhead of authenticating the request again and again as the API Gateway will only let through authenticated requests</li>
<li><strong>Routing</strong>: Since the client only knows about the API Gateway, it doesn&#39;t need to know about IPs or domains of all the microservices involved in the system. This also enables microservices to change freely as they don&#39;t have to worry about letting the client know about the internal changes as they are virtually transparent to the client</li>
<li><strong>Rate limiting</strong>: One of the important advantages of having an API gateway is it&#39;s ability to rate-limit incoming requests. This hugely helps in spam prevention and also avoiding DOS attacks.</li>
<li><strong>Logging and analytics</strong>: Since all the requests go through a single entity, important analytics, such as, who is accessing, what is being accessed, which is the most used endpoint, etc. can be easily obtained and a lot of meaningful insights can be derived from it</li>
</ul>
<p>But wait a minute. Let us take a step back and analyse. Doesn&#39;t such a pattern resemble to one of the most basic problems that any good architecture aims to solve? No points for guessing the right answer:  <a target='_blank' rel='noopener noreferrer'  href="https://en.wikipedia.org/wiki/Single_point_of_failure">Single Point of Failures (SPOF)</a>. This API gateway now suddenly becomes a big bottleneck. It becomes a big engineering dependency.</p>
<h3 id="service-mesh">Service mesh</h3>
<p>Service mesh has been around from more than a couple of years now. In simple terms, a service mesh can be imagined as a distributed internal API gateway. An API gateway basically handles what we call as the North-South traffic. North-south traffic is basically the traffic that flows from host to servers. It is like the traffic that flows from top-down, or vertically. In order to remove the SPOF introduced due to a single API gateway, we want to take this north-south traffic and apply it as the east-west traffic within our cluster. Similar to north-south, east-west traffic is the traffic that flows within the servers. This can be imagined as the traffic that flows horizontally within the individual microservices.</p>
<p>Service mesh uses what we call as the sidecar pattern in architecture. The sidecar pattern is a single-node pattern made up of two containers. The first is the application container. It contains the core logic for the application. Without this container, the application would not exist. In addition to the application container, there is a sidecar container. The role of the sidecar is to augment and improve the application container, often without the application container’s knowledge. In its simplest form, a sidecar container can be used to add functionality to a container that might otherwise be difficult to improve. </p>
<p>This sidecar is usually language-agnostic. There could be sidecars for collecting logs, side-cars for monitoring, etc. To address the centralized problem caused by using a single API gateway, we can use sidecars as proxies attached to services. These proxies can have appropriate intelligence to carry out the function of routing to other microservices. It can also have service discovery. So in case if any microservice&#39;s IP changes, it automatically knows about it. Other features such as rate limiting can also be possible because of side-cars. For example, retries can be dropped so that the other services doesn&#39;t face a DOS attack and prevent it from drowning in case of unfortunate blips.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1595162879208/ApMdj2IW6.jpeg" alt="sidecar-as-proxy.jpg"></p>
<h3 id="event-driven">Event driven</h3>
<p>To solve many of the issues which stemmed from use of RESTful APIs to communicate within microservices, we take the request-response architecture and split it in an event-driven architecture. In a request-driven architecture, microservices either tell others what to do (commands) or ask specific questions to get things done (queries), using RESTful APIs. In an event-driven architecture, microservices broadcast all events to every other microservice. You can think of events as not just facts but also triggers.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1595164003492/5q2cO89NU.jpeg" alt="request-driven.jpg">
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1595164747402/Ru1VhiDgh.jpeg" alt="event-driven.jpg"></p>
<p>To understand the difference between both types, let us consider an example. Suppose you want to buy an item online, we can say that we have microservices that handle orders, shipping and customers. If a customer places an order, a request is made to the order service. This order service places the order, and co-ordinates with the shipping service to provision shipping of the product. The shipping service, in turn, communicates with the customer service to fetch customer details. These customer details when returned by the customer service, may contain address details of the customer to where the shipping of the item will be triggered.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1595165539277/8iE8UyAM0.jpeg" alt="communication-using-rest.jpg"></p>
<p>However, there are some challenges that need to be solved with such an architectural pattern. What if the shipping service suddenly goes down? How long should the order service keep retrying for? Questions like these, and more, can be solved using the event-driven pattern.</p>
<p>In the case of event-driven architectures, when an order is received, the corresponding event, also called as fact, is written to this huge log of events. Other services, read all these events that are being written to the log, and then act on the events that are relevant to them. This happens in case of all the microservices. For example, if customer changes his address, the customer services publishes this fact in the event log. The shipping service, sensing a change in address, carries out the necessary actions.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1595168397491/pRPw3r4UO.jpeg" alt="communication-using-events.jpg"></p>
<p>Since the events are persisted in the log, in case if any service goes down, all that it needs to do is to read events from this stream whenever it comes back up, in order to catch up with the missed events.</p>
<p>It is important to note that all these events are stateful. Every microservice maintains a DB of it&#39;s own. This DB may not be a full-blown DB. It can even be something as simple as a key-value store. These DB may or may not contain redundant data. But the bottom line is that every DB will contain information that is relevant to that microservice. These DBs can also act as local caches to further reduce latency and thereby increase performance.</p>
<h3 id="serverless">Serverless</h3>
<p>Serverless is an architectural pattern where the cloud provider is responsible for executing a piece of code by dynamically allocating the resources. This eventually results in lesser number of resources used to run the code. The code is typically run inside stateless containers that can be triggered by a variety of events including HTTP requests, database events, queuing services, etc. The code that is sent to the cloud provider for execution is usually in the form of a function. Hence, serverless is also referred as <em>Function-as-a-service (FaaS)</em> as opposed to the traditional <em>Backed-as-a-Service (BaaS)</em> pattern. Since everything happens on-demand, these containers are ephemeral. They are dynamically spun up on receiving an event, and also conveniently destroyed after having served it&#39;s purpose. This hugely helps in scaling</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1595169447883/RoxjfvL4C.jpeg" alt="function-as-a-service.jpg"></p>
<p>However, there&#39;s a catch. One essential thing is missing from this pattern. States! In the case of other architectural patterns, we discussed how every microservice maintains a database of it&#39;s own, and how it helps in serving the purpose of a local cache, thereby reducing latency and increasing performance. But in case of serverless patterns, we do not maintain states for any of our containers as they themselves are ephemeral.</p>
<h2 id="future">Future</h2>
<p>Having seen the origins of the SOA pattern, and it&#39;s process of evolution up to the Serverless pattern, we can now see what the future holds for us. At this moment, we work around the problem of having to maintain states in our serverless functions by using a cloud store. Right now this is doing the job for us, but it is not really ideal. Maintaining a separate cloud store is expensive and introduces unnecessary overhead. We want something more traditional, where every microservice maintained a database of it&#39;s own thereby maintaining their own state. Microsoft&#39;s Azure, has  <a target='_blank' rel='noopener noreferrer'  href="https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview?tabs=csharp">Durarable Functions</a>, which has taken a step in this direction, with the aim of solving this problem. Other problems that we still need to solve, include having triggers and data from data stores to functions. There are various uses cases which demand this requirement. A unified view of the current state, compiled from the states of all the microservices can also help us in many ways. These problems, are one of the hardest and most interesting part in serverless right now. There is a lot of active research and development going on in this field. There is no doubt that serverless will be a big part of the future.</p>
]]></content:encoded></item><item><title><![CDATA[Demystifying Containers]]></title><description><![CDATA[Ever since Docker released its first version back in 2013, it triggered a major shift in the way the software industry works. Lightweight VMs suddenly caught the attention of the world and opened opportunities of unlimited possibilities. Containers p...]]></description><link>https://blog.pratikms.com/demystifying-containers</link><guid isPermaLink="true">https://blog.pratikms.com/demystifying-containers</guid><category><![CDATA[Docker]]></category><category><![CDATA[containers]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[Devops]]></category><category><![CDATA[Go Language]]></category><dc:creator><![CDATA[Pratik Shivaraikar]]></dc:creator><pubDate>Wed, 17 Jun 2020 19:15:35 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1595270422057/FPGROR16y.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Ever since Docker released its first version back in 2013, it triggered a major shift in the way the software industry works. <em>Lightweight VMs</em> suddenly caught the attention of the world and opened opportunities of unlimited possibilities. Containers provided a way to get a grip on software. You can use Docker Containers to wrap up an application in such a way that its deployment and runtime issues— how to expose it on a network, how to manage its use of storage and memory and I/O, how to control access permissions, etc. — are handled outside of the application itself, and in a way that is consistent across all <em>containerized</em> apps.</p>
<p>Containers offers many other benefits besides just handy encapsulation, isolation, portability, and control. Containers are small (megabytes). They start instantly. They have their own built-in mechanisms for versioning and component reuse. They can be easily shared via the public or private repositories.</p>
<p>Today, Containers are an essential component of the Software Development process. Many of us use it on a day-to-day basis. In spite of all this, there is still a lot of <em>magic</em> involved for many who want to venture into the world of Containers in general. Even till date, there is a lot of ambiguity in how exactly a container works. Today we will demystify a lot of that <em>magic</em>. But before that, I believe it is necessary for us to understand the process of evolution which lead what we know as Containers today.</p>
<h1 id="the-world-before-containers">The world before Containers</h1>
<p>For many years now, enterprise software has typically been deployed either on <em>bare metal</em> (i.e. installed on an operating system that has complete control over the underlying hardware) or in a virtual machine (i.e. installed on an operating system that shares the underlying hardware with other <em>guest</em> operating systems). Naturally, installing on bare metal made the software painfully difficult to move around and difficult to update — two constraints that made it hard for IT to respond nimbly to changes in business needs.</p>
<p>Then virtualization came along. Virtualization platforms (also known as <em>hypervisors</em>) allowed multiple virtual machines to share a single physical system, each virtual machine emulating the behavior of an entire system, complete with its own operating system, storage, and I/O, in an isolated fashion. IT could now respond more effectively to changes in business requirements, because VMs could be cloned, copied, migrated, and spun up or down to meet demand or conserve resources.</p>
<p>Virtual machines also helped cut costs, because more VMs could be consolidated onto fewer physical machines. Legacy systems running older applications could be turned into VMs and physically decommissioned to save even more money.</p>
<p>But virtual machines still have their share of problems. Virtual machines are large (gigabytes), each one containing a full operating system. Only so many virtualized apps can be consolidated onto a single system. Provisioning a VM still takes a fair amount of time. Finally, the portability of VMs is limited. After a certain point, VMs are not able to deliver the kind of speed, agility, and savings that fast-moving businesses are demanding.</p>
<h1 id="containers">Containers</h1>
<p>Containers work a little like VMs, but in a far more specific and granular way. They isolate a single application and its dependencies — all of the external software libraries the app requires to run — both from the underlying operating system and from other containers. All of the containerized apps share a single, common operating system, but they are compartmentalized from one another and form the system at large.</p>
<p>Taking an example of docker, in the image below, you can see that my host OS has a hostname of it&#39;s own. It has it&#39;s own set of processes running. When I run an Ubuntu container, we can see that it has it&#39;s own hostname and it&#39;s own set of processes:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1592144463096/VcoHpx67d.png" alt="8. docker.png"></p>
<p>This means that our Ubuntu container is running in an isolated environment. The PID 1 confirms this fact. Similarly we can provide a mounted storage to our container, or allocate a particular number of processes or a certain amount of RAM to run with. But what exactly is all this? What exactly is process isolation? What is a containerized environment? What do metered resources mean? </p>
<p>We will try to make sense of all this jargon. We will try to replicate the behavior of <code>docker run &lt;image&gt;</code> as close as possible. To make it all happen, we will be using Go for this purpose. There is no specific reason behind the selection of Go in this case. You can literally choose any language like Rust, Python, Node, etc. The only requirement is that the language should support syscalls and namespaces. The reason why I picked Go for this purpose is just a personal preference. The fact that Docker is built on Go also helps my case.</p>
<h1 id="building-a-container-from-scratch">Building a container from scratch</h1>
<p>As mentioned earlier, we will try to replicate something as close to docker as possible. Just like <code>docker run &lt;image&gt; cmd args</code> we will go for <code>go run main.go cmd args</code>. To start with, we will proceed with the basic snippet that most Go plugins of all the major editors has to offer:</p>
<pre><code><span class="hljs-keyword">package</span> main

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">main</span><span class="hljs-params">()</span></span> {

}
</code></pre><p>Now we will add support for execution of basic commands like echo and cat</p>
<pre><code><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">must</span><span class="hljs-params">(err error)</span></span> {
    <span class="hljs-comment">// If error exists, panic and exit</span>
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-built_in">panic</span>(err)
    }
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">run</span><span class="hljs-params">()</span></span> {
    fmt.Printf(<span class="hljs-string">"Running %v\n"</span>, os.Args[<span class="hljs-number">2</span>:])

    <span class="hljs-comment">// Execute the commands that follow 'go run main.go run'</span>
    cmd := exec.Command(os.Args[<span class="hljs-number">2</span>], os.Args[<span class="hljs-number">3</span>:]...)
    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    must(cmd.Run())
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">main</span><span class="hljs-params">()</span></span> {
    <span class="hljs-comment">// Make sure that the first argument after 'go run main.go' is 'run'</span>
    <span class="hljs-keyword">switch</span> os.Args[<span class="hljs-number">1</span>] {
    <span class="hljs-keyword">case</span> <span class="hljs-string">"run"</span>:
        run()
    <span class="hljs-keyword">default</span>:
        <span class="hljs-built_in">panic</span>(<span class="hljs-string">"I'm sorry, what?"</span>)
    }
}
</code></pre><p>Let&#39;s see what that boils down to:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1591528131542/JQBS7gSaK.png" alt="1. echo hello world.png"></p>
<p>Now that we can run simple commands with our script, we will try running a bash shell. Since it can get confusing as we are already in a shell, we will try to run <code>ps</code> before and after running our script.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1591529527051/8oKYyUhOl.png" alt="2. running bash.png"></p>
<p>It is still difficult to say anything. To confirm if we have isolation like an actual container, let us try by simply changing the <code>hostname</code> from within our <code>bash</code> shell launched using our script. To modify <code>hostname</code>, we need to be root:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1591530062086/TXRWJ6Ork.png" alt="3. changing hostname-1.png"></p>
<p>Just to summarize, we did the following in the specified order:</p>
<ul>
<li>Check the processes running on our host OS, by running the command <code>ps</code></li>
<li>Check hostname of our host OS by running the <code>hostname</code> command</li>
<li>Run our script to launch a <code>bash</code> shell</li>
<li>Check the processes running in our launched <code>bash</code> shell using the <code>ps</code> command</li>
<li>Check the <code>hostname</code> from within our launched <code>bash</code> shell</li>
<li>Try to modify the <code>hostname</code> and set it to arbitrary string</li>
<li>Verify if the <code>hostname</code> was modified successfully within our launched <code>bash</code> shell. It indeed did.</li>
<li>Exit to return to our host OS shell</li>
<li>Check <code>hostname</code> in our host OS</li>
<li>The <code>hostname</code> change within our <code>bash</code> shell launched using our script unfortunately persisted causing the <code>hostname</code> to change in our host OS as well</li>
</ul>
<p>This means that we do not have isolation as of yet. To address this, we need the help of namespaces</p>
<h2 id="namespaces">Namespaces</h2>
<p>Namespaces provide the isolation needed to run multiple containers on one machine while giving each what appears like it’s own environment. There are six namespaces. Each can be independently requested and amounts to giving a process (and its children) a view of a subset of the resources of the machine.</p>
<h3 id="pid">PID</h3>
<p>The PID namespace gives a process and its children their own view of a subset of the processes in the system. This is in analogous to a mapping table. When a process of a PID namespace asks the kernel for a list of processes, the kernel looks in the mapping table. If the process exists in the table the mapped ID is used instead of the real ID. If it doesn’t exist in the mapping table, the kernel pretends it doesn’t exist at all. The PID namespace makes the first process created within it PID 1 (by mapping whatever its host ID is to 1), giving the appearance of an isolated process tree in the container. This is a really interesting concept. </p>
<h3 id="mnt">MNT</h3>
<p>In a way, this one is the most important. The mount namespace gives the process’s contained within it their own mount table. This means they can mount and unmount directories without affecting other namespaces including the host namespace. More importantly, in combination with the pivot_root syscall it allows a process to have its own filesystem. This is how we can have a process think it’s running on Ubuntu, CentOS, Alpine, etc — by swapping out the filesystem that the container sees.</p>
<h3 id="net">NET</h3>
<p>The network namespace gives the processes that use it their own network stack. In general only the main network namespace (the one that the processes that start when you start your computer use) will actually have any real physical network cards attached. But we can create virtual ethernet pairs — linked ethernet cards where one end can be placed in one network namespace and one in another creating a virtual link between the network namespaces. Kind of like having multiple IP stacks talking to each other on one host. With a bit of routing magic this allows each container to talk to the real world while isolating each to its own network stack.</p>
<h3 id="uts">UTS</h3>
<p>The UTS namespace gives its processes their own view of the system’s hostname and domain name. After entering a UTS namespace, setting the hostname or the domain name will not affect other processes.</p>
<h3 id="ipc">IPC</h3>
<p>The IPC Namespace isolates various inter-process communication mechanisms such as message queues. This particular namespace deserves a blog post of it&#39;s own. There&#39;s so much to IPC than what I can comprehend myself. Which is why I will encourage you to check out the  <a target='_blank' rel='noopener noreferrer'  href="https://www.man7.org/linux/man-pages/man7/namespaces.7.html">namespace docs</a>  for more details.</p>
<h3 id="user">USER</h3>
<p>The user namespace was the most recently added, and is the likely the most powerful from a security perspective. The user namespace maps the UIDs to different set of UIDs (and GIDs) on the host. This is extremely useful. Using a user namespace we can map the container&#39;s root user ID (i.e. 0) to an arbitrary and unprivileged UID on the host. This means we can let a container think it has root access without actually giving it any privileges in the root namespace. The container is free to run processes as uid 0 - which normally would be synonymous with having root permissions, but the kernel is actually mapping that UID under the covers to an unprivileged real UID belonging to the host OS.</p>
<p>Most container technologies place a user’s process into all of the above namespaces and initialize the namespaces to provide a standard environment. This amounts to, for example, creating an initial internet card in the isolated network namespace of the container with connectivity to a real network on the host. In our case, for satisfying our immediate requirement, we will add the UTS namespace to our script so that we can modify hostname.</p>
<pre><code><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">run</span><span class="hljs-params">()</span></span> {
    <span class="hljs-comment">// Stuff that we previously went over</span>

    cmd.SysProcAttr = &amp;syscall.SysProcAttr{
        Cloneflags: syscall.CLONE_NEWUTS,
    }

    must(cmd.Run())
}
</code></pre><p>Running it, returns:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1591532363023/lFEM94IUc.png" alt="3. changing hostname-2.png"></p>
<p>Awesome! We now have the ability to modify hostname in our container-like environment without letting the host environment change. But, if we observe closely, our process IDs within the container are still the same. We&#39;re able to see the processes running in our host OS even from within our container. To fix this, we need to use the PID namespace. As discussed above, the PID namespace will allow us process isolation.</p>
<pre><code><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">run</span><span class="hljs-params">()</span></span> {
    <span class="hljs-comment">// Stuff that we previously went over</span>

    cmd.SysProcAttr = &amp;syscall.SysProcAttr{
        Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID,
    }

    must(cmd.Run())
}
</code></pre><p>However, unlike the case of UTS namespace, simply adding the PID namespace here like this won&#39;t help. We will have to create another copy of our process so that it can be run with PID 1.</p>
<pre><code><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">run</span><span class="hljs-params">()</span></span> {
    cmd := exec.Command(<span class="hljs-string">"/proc/self/exe"</span>, <span class="hljs-built_in">append</span>([]<span class="hljs-keyword">string</span>{<span class="hljs-string">"child"</span>}, os.Args[<span class="hljs-number">2</span>:]...)...)
    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    cmd.SysProcAttr = &amp;syscall.SysProcAttr{
        Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID,
    }

    must(cmd.Run())
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">child</span><span class="hljs-params">()</span></span> {
    fmt.Printf(<span class="hljs-string">"Running %v as PID %d\n"</span>, os.Args[<span class="hljs-number">2</span>:], os.Getpid())

    cmd := exec.Command(os.Args[<span class="hljs-number">2</span>], os.Args[<span class="hljs-number">3</span>:]...)
    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    must(cmd.Run())
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">main</span><span class="hljs-params">()</span></span> {
    <span class="hljs-keyword">switch</span> os.Args[<span class="hljs-number">1</span>] {
    <span class="hljs-keyword">case</span> <span class="hljs-string">"run"</span>:
        run()
    <span class="hljs-keyword">case</span> <span class="hljs-string">"child"</span>:
        child()
    <span class="hljs-keyword">default</span>:
        <span class="hljs-built_in">panic</span>(<span class="hljs-string">"I'm sorry, what?"</span>)
    }
}
</code></pre><p>What we&#39;re basically doing is that whenever we will run <code>go run main.go run bash</code>, our <code>main()</code> function will be called. As the value of <code>os.Args[1]</code> will be &#39;run&#39; at this instance, it will call our <code>run()</code> function. Within <code>run()</code>, we are using <code>/proc/self/exe</code> to create a copy our current process. We are essentially creating a copy and calling it again by appending the string &#39;child&#39; to it followed by the rest of the arguments that we received in <code>run()</code>. When we do this, our <code>main()</code> function will be invoked again with the difference being that the value of <code>os.Args[1]</code> will be &#39;child&#39; this time. From there on, the rest of the script executes as we saw before.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1591534948440/y2aXbM48j.png" alt="4. ps-1.png"></p>
<p>Unfortunately, even after doing all this, the results that we get are not that different. To understand why, we need to know what exactly goes on behind the scenes when we run the <code>ps</code> command. It turns out that <code>ps</code> looks at <code>/proc</code> directory to find out what processes are currently running on the host. Let us observe the contents of the <code>/proc</code> directory from our host and also from our container.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1591802829525/9nyKi5-_d.png" alt="4. ls proc - 1.png"></p>
<p>As we can see, the contents of the <code>/proc</code> directory when observed from the host and even from the container are one and the same. To overcome this, we wan&#39;t the <code>ps</code> of our container to be looking at a <code>/proc</code> directory of it&#39;s own. I other words we need to provide our container it&#39;s own filesystem. This brings us to an important concept of containers: layered filesystems</p>
<h2 id="layered-filesystems">Layered Filesystems</h2>
<p>Layered Filesystems are how we can efficiently move whole machine images around. They&#39;re the reason why the ship floats and does not sinks. At a basic level, layered filesystems amount to optimizing the call to create a copy of the root filesystem for each container. There are numerous ways of doing this.  <a target='_blank' rel='noopener noreferrer'  href="https://btrfs.wiki.kernel.org/">Btrfs</a> uses copy on write (COW) at the filesystem layer.  <a target='_blank' rel='noopener noreferrer'  href="https://en.wikipedia.org/wiki/Aufs">Aufs</a> uses “union mounts”. Since there are so many ways to achieve this step, we will just use something horribly simple. We’ll do a copy of the filesystem. It’s slow, but it works.</p>
<p>To do this, I have a copy of the Lubuntu filesystem copied in the path specified below. The same can be seen in the screenshot provided below as I have <code>touch</code>ed HOST_FS and CONTAINER_FS as two files within the root of the host and within the copy of our Lubuntu FS.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1591804027767/_6RRfBLsv.png" alt="4. file system.png"></p>
<p>We will now have to let our container know about this filesystem and ask it to change it&#39;s root to this copied filesystem. We will also have to ask the container to change it&#39;s directory to <code>/</code> once it&#39;s launched.</p>
<pre><code><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">child</span><span class="hljs-params">()</span></span> {
    <span class="hljs-comment">// Stuff that we previously went over</span>

    must(syscall.Chroot(<span class="hljs-string">"/home/lubuntu/Projects/make-sense-of-containers/lubuntu-fs"</span>))
    must(syscall.Chdir(<span class="hljs-string">"/"</span>))
    must(cmd.Run())
}
</code></pre><p>Running this we get our intended FS. We can confirm it as we can see CONTAINER_FS, the file that we created in our container:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1591804634404/3ME3ipp1P.png" alt="4. ls proc - 2.png"></p>
<p>However, once again, in-spite of all of these efforts, <code>ps</code> still remains a problem.</p>
<p>This is because while we provided a new filesystem for our container using <code>chroot</code>, we forgot that <code>/proc</code>, in itself, is a special type of virtual filesystem. <code>/proc</code> is sometimes referred to as a process information pseudo-file system. It doesn&#39;t contain &#39;real&#39; files but runtime system information like system memory, devices mounted, hardware configuration, etc. For this reason it can be regarded as a control and information center for the kernel. In fact, quite a lot of system utilities are simply calls to files in this directory. For example, <code>lsmod</code> is the same as <code>cat /proc/modules</code>. By altering files located in this directory you can even read/change kernel parameters like <code>sysctl</code> while the system is still running.</p>
<p>Hence, we need to mount <code>/proc</code> for our <code>ps</code> command to be able to work.</p>
<pre><code><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">child</span><span class="hljs-params">()</span></span> {
    <span class="hljs-comment">// Stuff that we previously went over</span>

    must(syscall.Chroot(<span class="hljs-string">"/home/lubuntu/Projects/make-sense-of-containers/lubuntu-fs"</span>))
    must(syscall.Chdir(<span class="hljs-string">"/"</span>))
    <span class="hljs-comment">// Parameters to this syscall.Mount() are:</span>
    <span class="hljs-comment">// source FS, target FS, type of the FS, flags and data to be written in the FS</span>
    must(syscall.Mount(<span class="hljs-string">"proc"</span>, <span class="hljs-string">"proc"</span>, <span class="hljs-string">"proc"</span>, <span class="hljs-number">0</span>, <span class="hljs-string">""</span>))

    must(cmd.Run())

    <span class="hljs-comment">// Very important to unmount in the end before exiting</span>
    must(syscall.Unmount(<span class="hljs-string">"/proc"</span>, <span class="hljs-number">0</span>))
}
</code></pre><p>You can think of <code>syscall.Mount()</code> and <code>syscall.Unmount()</code> as the functions that are called when you plug-in and safely remove a pen-drive. In the same analogy, we <code>mount</code> and <code>unmount</code> our <code>/proc</code> filesystem in our container.</p>
<p>Now if we run <code>ps</code> from our container:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1591807744347/sNB8IALVQ.png" alt="4. ps-2.png"></p>
<p>There! After all these efforts, we finally have PID 1! We have finally achieved process isolation. We can see our <code>/proc</code> filesystem has been mounted by doing <code>ls /proc</code> which lists the current process information of our container. </p>
<p>One small thing that we need to check is to see the mount points of <code>proc</code>. We will do that by first running <code>mount | grep proc</code> from our host OS. We will then launch our container and again run the same command. With our container still running, we will once again run <code>mount | grep proc</code> to check the mount points of <code>proc</code> with our container running.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1591812483241/K-EK6CS_f.png" alt="5. mount proc-1.png"></p>
<p>As we can see, if we run <code>mount | grep proc</code> from our host OS with our container running, the host OS can see where <code>proc</code> is mounted in our container. This should not be the case. Ideally, our containers should be as transparent to the host OS as possible. To fix this, all we need to do is to add MNT namespace to our script:</p>
<pre><code>func run() {
    <span class="hljs-regexp">//</span> Stuff we previously went over

    cmd.SysProcAttr = &amp;<span class="hljs-keyword">syscall</span>.SysProcAttr{
        Cloneflags:   <span class="hljs-keyword">syscall</span>.CLONE_NEWUTS | <span class="hljs-keyword">syscall</span>.CLONE_NEWPID | <span class="hljs-keyword">syscall</span>.CLONE_NEWNS,
        Unshareflags: <span class="hljs-keyword">syscall</span>.CLONE_NEWNS,
    }

    must(cmd.Run())
}
</code></pre><p>Now if we observe the mount points from our host OS with the container running, we get:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1591812662401/iNjq-bnDN.png" alt="5. mount proc-2.png"></p>
<p>There! With this, now we can say that we have a truly isolated environment. Just so that there is a better distinction between our host and our containerized environments, we can assign our container some arbitrary hostname</p>
<pre><code> <span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">child</span><span class="hljs-params">()</span></span> {
    <span class="hljs-comment">// Stuff that we previously went over</span>

    must(syscall.Sethostname([]<span class="hljs-keyword">byte</span>(<span class="hljs-string">"container"</span>)))
    must(syscall.Chroot(<span class="hljs-string">"/home/lubuntu/Projects/make-sense-of-containers/lubuntu-fs"</span>))
    must(syscall.Chdir(<span class="hljs-string">"/"</span>))
    must(syscall.Mount(<span class="hljs-string">"proc"</span>, <span class="hljs-string">"proc"</span>, <span class="hljs-string">"proc"</span>, <span class="hljs-number">0</span>, <span class="hljs-string">""</span>))

    must(cmd.Run())

    must(syscall.Unmount(<span class="hljs-string">"/proc"</span>, <span class="hljs-number">0</span>))
}
</code></pre><p>Running it gives:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1591885392043/rSya4T7qC.png" alt="6. hostname change.png"></p>
<p>This gives us a fully running, fully functioning container!</p>
<p>There is, however, one more important concept which we haven&#39;t yet covered. While Namespaces provide isolation, and Layered Filesystems provide us with a root filesystem for our container, we need Cgroups for resource sharing.</p>
<h2 id="cgroups">Cgroups</h2>
<p>Cgroups, also known as Control Groups, previously known as Process Groups is perhaps one of the most prominent contribution of Google to the software world. Fundamentally, cgroups collect a set of process or task ids together and apply limits to them. Where namespaces isolate a process, cgroups enforce resource sharing between processes.</p>
<p>Just like <code>/proc</code>, Cgroups too, are exposed by the kernel as a special file system that we can mount. We add a process or thread to a cgroup by simply adding process ids to a tasks file, and then read and configure various values by essentially editing files in that directory.</p>
<pre><code><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">cg</span><span class="hljs-params">()</span></span> {
    <span class="hljs-comment">// Location of the Cgroups filesystem</span>
    cgroups := <span class="hljs-string">"/sys/fs/cgroup/"</span>
    pids := filepath.Join(cgroups, <span class="hljs-string">"pids"</span>)

    <span class="hljs-comment">// Creating a directory named 'pratikms' inside '/sys/fs/cgroup/pids'</span>
    <span class="hljs-comment">// We will use this directory to configure various parameters for resource sharing by our container</span>
    err := os.Mkdir(filepath.Join(pids, <span class="hljs-string">"pratikms"</span>), <span class="hljs-number">0755</span>)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> &amp;&amp; !os.IsExist(err) {
        <span class="hljs-built_in">panic</span>(err)
    }

    <span class="hljs-comment">// Allow a maximum of 20 processes to be run in our container</span>
    must(ioutil.WriteFile(filepath.Join(pids, <span class="hljs-string">"pratikms/pids.max"</span>), []<span class="hljs-keyword">byte</span>(<span class="hljs-string">"20"</span>), <span class="hljs-number">0700</span>))

    <span class="hljs-comment">// Remove the new cgroup after container exits</span>
    must(ioutil.WriteFile(filepath.Join(pids, <span class="hljs-string">"pratikms/notify_on_release"</span>), []<span class="hljs-keyword">byte</span>(<span class="hljs-string">"1"</span>), <span class="hljs-number">0700</span>))

    <span class="hljs-comment">// Add our current PID to cgroup processes</span>
    must(ioutil.WriteFile(filepath.Join(pids, <span class="hljs-string">"pratikms/cgroup.procs"</span>), []<span class="hljs-keyword">byte</span>(strconv.Itoa(os.Getpid())), <span class="hljs-number">0700</span>))
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">child</span><span class="hljs-params">()</span></span> {
    fmt.Printf(<span class="hljs-string">"Running %v as PID %d\n"</span>, os.Args[<span class="hljs-number">2</span>:], os.Getpid())

    <span class="hljs-comment">// Invoke cgroups</span>
    cg()

    cmd := exec.Command(os.Args[<span class="hljs-number">2</span>], os.Args[<span class="hljs-number">3</span>:]...)
    <span class="hljs-comment">// Stuff that we previously went over</span>

    must(syscall.Unmount(<span class="hljs-string">"/proc"</span>, <span class="hljs-number">0</span>))
}
</code></pre><p>On running our container, we can see the directory &#39;pratikms&#39; created inside <code>/sys/fs/cgroup</code> from our host. It has all the necessary files in it control resource sharing within our container.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1591892753704/DXfJqy6x9.png" alt="7. cgroups-1.png"></p>
<p>When we <code>cat pids.max</code> from our host, we can see that our container is limited to running a maximum of 20 processes at a time. If we <code>cat pids.current</code>, we can see the number of processes currently running in our container. Now, we need to test the resource limitation that we applied on our container.</p>
<h2 id="-">:() { : | : &amp; }; :</h2>
<p>No, this is not a typo. Neither did you read it wrong. It&#39;s essentially a fork bomb. A fork bomb is a denial-of-service attack wherein a process continuously replicates itself to deplete available system resources, slowing down or crashing the system due to resource starvation. To make more sense of it, you can literally replace the <code>:</code> in it with anything. For example, <code>:() { : | : &amp; }; :</code> can also be written as <code>forkBomb() { forkBomb | forkBomb &amp;}; forkBomb</code>. It means that we&#39;re declaring a function <code>forkBomb()</code> who&#39;s body recursively calls itself with <code>forkBomb | forkBomb</code> and runs it in background using <code>&amp;</code>. Finally, we call it using <code>forkBomb</code>. While this works, a fork bomb is conventionally written as <code>:() { : | : &amp; }; :</code>, and that is what we will proceed with:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1591892775962/7nfc4cbwY.png" alt="7. cgroups-2.png"></p>
<p>As we can see, our the current number of processes running within our container were 6. After we triggered the fork bomb, the current number of running processes increased to 20 and remained stable there. We can confirm the forks by observing the output of <code>ps fax</code>:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1591892789018/iWNfP3xKU.png" alt="7. cgroups-3.png"></p>
<h1 id="putting-it-all-together">Putting it all together</h1>
<p>So here it is, a super super simple container, in less than 100 lines of code. Obviously this is intentionally simple. If you use it in production, you are crazy and, more importantly, on your own. But I think seeing something simple and hacky gives us a really useful picture of what’s going on.</p>
<pre><code><span class="hljs-keyword">package</span> main

<span class="hljs-keyword">import</span> (
    <span class="hljs-string">"fmt"</span>
    <span class="hljs-string">"io/ioutil"</span>
    <span class="hljs-string">"os"</span>
    <span class="hljs-string">"os/exec"</span>
    <span class="hljs-string">"path/filepath"</span>
    <span class="hljs-string">"strconv"</span>
    <span class="hljs-string">"syscall"</span>
)

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">must</span><span class="hljs-params">(err error)</span></span> {
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-built_in">panic</span>(err)
    }
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">cg</span><span class="hljs-params">()</span></span> {
    cgroups := <span class="hljs-string">"/sys/fs/cgroup/"</span>
    pids := filepath.Join(cgroups, <span class="hljs-string">"pids"</span>)
    err := os.Mkdir(filepath.Join(pids, <span class="hljs-string">"pratikms"</span>), <span class="hljs-number">0755</span>)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> &amp;&amp; !os.IsExist(err) {
        <span class="hljs-built_in">panic</span>(err)
    }
    must(ioutil.WriteFile(filepath.Join(pids, <span class="hljs-string">"pratikms/pids.max"</span>), []<span class="hljs-keyword">byte</span>(<span class="hljs-string">"20"</span>), <span class="hljs-number">0700</span>))
    <span class="hljs-comment">// Remove the new cgroup after container exits</span>
    must(ioutil.WriteFile(filepath.Join(pids, <span class="hljs-string">"pratikms/notify_on_release"</span>), []<span class="hljs-keyword">byte</span>(<span class="hljs-string">"1"</span>), <span class="hljs-number">0700</span>))
    must(ioutil.WriteFile(filepath.Join(pids, <span class="hljs-string">"pratikms/cgroup.procs"</span>), []<span class="hljs-keyword">byte</span>(strconv.Itoa(os.Getpid())), <span class="hljs-number">0700</span>))
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">child</span><span class="hljs-params">()</span></span> {
    fmt.Printf(<span class="hljs-string">"Running %v as PID %d\n"</span>, os.Args[<span class="hljs-number">2</span>:], os.Getpid())

    cg()

    cmd := exec.Command(os.Args[<span class="hljs-number">2</span>], os.Args[<span class="hljs-number">3</span>:]...)
    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    must(syscall.Sethostname([]<span class="hljs-keyword">byte</span>(<span class="hljs-string">"container"</span>)))
    must(syscall.Chroot(<span class="hljs-string">"/home/lubuntu/Projects/make-sense-of-containers/lubuntu-fs"</span>))
    must(syscall.Chdir(<span class="hljs-string">"/"</span>))
    must(syscall.Mount(<span class="hljs-string">"proc"</span>, <span class="hljs-string">"proc"</span>, <span class="hljs-string">"proc"</span>, <span class="hljs-number">0</span>, <span class="hljs-string">""</span>))

    must(cmd.Run())

    must(syscall.Unmount(<span class="hljs-string">"/proc"</span>, <span class="hljs-number">0</span>))
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">run</span><span class="hljs-params">()</span></span> {
    cmd := exec.Command(<span class="hljs-string">"/proc/self/exe"</span>, <span class="hljs-built_in">append</span>([]<span class="hljs-keyword">string</span>{<span class="hljs-string">"child"</span>}, os.Args[<span class="hljs-number">2</span>:]...)...)
    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    cmd.SysProcAttr = &amp;syscall.SysProcAttr{
        Cloneflags:   syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS,
        Unshareflags: syscall.CLONE_NEWNS,
    }

    must(cmd.Run())
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">main</span><span class="hljs-params">()</span></span> {
    <span class="hljs-keyword">switch</span> os.Args[<span class="hljs-number">1</span>] {
    <span class="hljs-keyword">case</span> <span class="hljs-string">"run"</span>:
        run()
    <span class="hljs-keyword">case</span> <span class="hljs-string">"child"</span>:
        child()
    <span class="hljs-keyword">default</span>:
        <span class="hljs-built_in">panic</span>(<span class="hljs-string">"I'm sorry, what?"</span>)
    }
}
</code></pre><p>Again, as stated before, this is in no way a production-ready code. I do have some hard-coded values in it. For example value of the path to the filesystem, and also the hostname of the container. If you wish to play around with the code you can get it from my  <a target='_blank' rel='noopener noreferrer'  href="https://github.com/pratikms/making-sense-of-containers">GitHub repo</a>. But, at the same time, I do believe this is a wonderful exercise to understand what goes on behind the scenes when we run that <code>docker run &lt;image&gt;</code> command in our terminal. It introduces us to some of the important OS concepts that Containers, in general leverage. Like Namespaces, Layered Filesystems, Cgroups, etc. Containers are important – and its prevalence in the job market is incredible. With Cloud, Docker and Kubernetes becoming more linked every day, that demand will only grow. Going forward, it is only imperative to understand the inner workings of a Container. And this was my small attempt in doing the same.</p>
]]></content:encoded></item><item><title><![CDATA[You Don't Know Deno?]]></title><description><![CDATA[When  Brendan Eich, during his time at  Netscape created JavaScript in 1995, I doubt that he seldom had any idea of what the language will grow out to be in the coming future. When Netscape partnered with Sun to take on their competitor Microsoft, Br...]]></description><link>https://blog.pratikms.com/you-dont-know-deno</link><guid isPermaLink="true">https://blog.pratikms.com/you-dont-know-deno</guid><category><![CDATA[JavaScript]]></category><category><![CDATA[TypeScript]]></category><category><![CDATA[Node.js]]></category><category><![CDATA[webdev]]></category><category><![CDATA[Deno]]></category><dc:creator><![CDATA[Pratik Shivaraikar]]></dc:creator><pubDate>Tue, 19 May 2020 18:33:05 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1595270493660/r_75ARuE6.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When  <a target='_blank' rel='noopener noreferrer'  href="https://en.wikipedia.org/wiki/Brendan_Eich">Brendan Eich</a>, during his time at  <a target='_blank' rel='noopener noreferrer'  href="https://en.wikipedia.org/wiki/Netscape">Netscape</a> created JavaScript in 1995, I doubt that he seldom had any idea of what the language will grow out to be in the coming future. When Netscape partnered with Sun to take on their competitor Microsoft, Brendan Eich decided to surf the tidal wave of hype surrounding Java. He found this reason compelling enough to rename Mocha - the language that he created to turn the web into a full-blown application platform - to JavaScript. He envisioned JavaScript to be marketed as a companion language to Java, in the same was as Visual Basic was to C++. So the name was a straightforward marketing ploy to gain acceptance.</p>
<p>By the 2000s, when  <a target='_blank' rel='noopener noreferrer'  href="https://en.wikipedia.org/wiki/Douglas_Crockford">Doughlas Crockford</a> invented the JSON data format using a subset of JavaScript syntax, a critical mass of developers emerged who started viewing JavaScript as a serious language. However, due to some early design choices like: automatic semicolon insertion (ASI), the event loop, lack of classes, unusual prototypical inheritance, type coercion etc. turned out to be tools for developers to laugh at and to ridicule those who were using this language. This cycle still continues.</p>
<p>It was only until a few years earlier due to &quot;Web 2.0&quot; applications such as Flickr, Gmail etc. when the world realized what a modern experience on the web could be like. It was also due to a still ongoing healthy competition between many browsers who competed to offer users a better experience and a better performance that the JavaScript engines also started becoming considerably better. Development teams behind major browsers worked hard to offer better support for JavaScript and find ways to make JavaScript run faster. This triggered significant improvements in a particular JavaScript engine called as V8 (also known as Chrome V8 for being the open-source JavaScript engine of The Chromium Project).</p>
<p>It was in 2009, when  <a target='_blank' rel='noopener noreferrer'  href="https://en.wikipedia.org/wiki/Ryan_Dahl">Ryan Dahl</a> paid special attention to this V8 engine to create  <a target='_blank' rel='noopener noreferrer'  href="https://nodejs.org/en/">Node.js</a>. His focus, initially was heavily on building event-driven HTTP servers. The main aim of event-driven HTTP servers is resolving the  <a target='_blank' rel='noopener noreferrer'  href="https://en.wikipedia.org/wiki/C10k_problem">C10k problem</a>. Simply put, the event-driven architecture provides relatively better performance while consuming lesser resources at the same time. It achieves this by avoiding spawning additional threads and the overheads caused by thread context-switching. It instead uses a single process to handle every event on a callback. This attempt of Ryan Dahl turned out to be crucial for the popularity that server-side JavaScript enjoys today.</p>
<p>Node.js, since then, has proved to be a very successful software platform. People have found it useful for building web development tooling, building standalone web servers, and for a myriad of other use-cases. Node, however, was designed in 2009 when JavaScript was a much different language. Out of necessity, Node had to invent concepts which were later taken up by the standards organizations and added to the language differently. Having said that, there have also been a few design decisions that Node suffers from. These design mistakes, compelled Ryan step down from the Node.js project. He has, since then, been working on another runtime which aims at solving these issues:  <a target='_blank' rel='noopener noreferrer'  href="https://deno.land/">Deno</a> . In this blog post, we will look at two of the major JavaScript runtimes that enable server-side JavaScript: Node.js and Deno. We will have a look at the problems with Node, and how Deno aims at resolving those.</p>
<h2 id="design-mistakes-in-node">Design mistakes in Node</h2>
<p>A lot of the discussion that is about to follow is inspired from a <a target='_blank' rel='noopener noreferrer'  href="https://www.youtube.com/watch?v=M3BM9TB-8yA">talk</a> that Ryan Dahl delivered at a JSConf. In the talk, he discusses about the problems that Node has. This doesn&#39;t necessarily mean that all Node projects should be abandoned at this very instance. It is important to note that Node is not going anywhere and that it is here to stay. It is only because of some of the inherent problems that Node has because of the not-so-rich JavaScript that was available at the time of it&#39;s design. This was in addition to some features and functionalities which were added on top of Node which made it a huge monolith thereby making things hard to change.</p>
<h3 id="event-emitters">Event-emitters</h3>
<p>Promises in Node.js promised to do some work and then had separate callbacks that would be executed for success and failure as well as handling timeouts. Another way to think of promises in Node.js was that they were emitters that could emit only two events: success and error. At the time of designing Node, JavaScript did not have the concept of Promises or async / await. Node&#39;s counterpart to promises was the EventEmitter, which important APIs are based around, namely sockets and HTTP. Async / await was later introduced more as a syntactic sugar to implement Promises. When implemented the right way, Promises are a great boon for the event-driven architecture.</p>
<p>Node&#39;s implementation of using EventEmitter though, has a small problem called as &#39;back-pressure&#39;. Take a TCP socket, for example. The socket would emit &quot;data&quot; events when it received incoming packets. These &quot;data&quot; callbacks would be emitted in an unconstrained manner, flooding the process with events. Because Node continues to receive new data events, the underlying TCP socket does not have proper back-pressure, the remote sender has no idea the server is overloaded and continues to send data.</p>
<h3 id="security">Security</h3>
<p>The V8 engine, by itself, is a very good security sandbox. However, Node failed to capitalize big on this. In it&#39;s earlier days, there was no way telling what a package can do with the underlying file system unless and until someone really looked into it&#39;s code. The trust comes from community usage.</p>
<h3 id="build-system">Build system</h3>
<p>Build systems are very difficult and very important at the same time. Node uses  <a target='_blank' rel='noopener noreferrer'  href="https://gyp.gsrc.io/">GYP</a>  as it&#39;s build system. GYP is intended to support large projects that need to be built on multiple platforms (e.g., Mac, Windows, Linux), and where it is important that the project can be built using the IDEs that are popular on each platform as if the project is a “native” one. If a Node module is linking to a C-library, GYP is used to compile that C-library and link it to Node. GYP was something that Chrome used at that time when Node was designed. Chrome, eventually, for various reasons, abandoned GYP for  <a target='_blank' rel='noopener noreferrer'  href="https://chromium.googlesource.com/chromium/src/tools/gn/+/48062805e19b4697c5fbd926dc649c78b6aaa138/README.md">GN</a>. This left Node as the sole GYP user.</p>
<h3 id="node-modules">Node modules</h3>
<p>When <a target='_blank' rel='noopener noreferrer'  href="https://nodejs.org/en/blog/npm/npm-1-0-released/">npm version 1 was released</a>  by  <a target='_blank' rel='noopener noreferrer'  href="https://www.linkedin.com/in/isaacschlueter">Isaac Schlueter</a>, it soon became the defacto standard. It solved some problems like &#39; <a target='_blank' rel='noopener noreferrer'  href="https://www.reddit.com/r/ProgrammerHumor/comments/75txp4/nodejs_dependency_hell_visualized_for_the_first/">dependency hell</a> &#39;. Before npm, a &#39;dependency hell&#39; usually occurred if one tried to install two versions of a package within the same folder. This resulted in the app to break. Thanks to npm, dependencies were now stored within the node_modules folder. But an unintended side-effect of this was that now every project had a &#39;node_modules&#39; directory in it. This resulted in increasing consumption of disk space. In addition to it, it added some overhead to the  <a target='_blank' rel='noopener noreferrer'  href="https://www.typescriptlang.org/docs/handbook/module-resolution.html">Module Resolution Algorithm</a>. Node has to first look out in one of the local folders, followed by the project&#39;s node_modules, failing which it had to search in the global node_modules. More complexity was added to this as the modules didn&#39;t have any extensions to it. The module loader has to query the file system at multiple locations trying to guess what the user intended.</p>
<p>Having said all this, it is important to mention that there are no inherent breaking faults in Node. Node.js is a time-tested and proven runtime. It recently  <a target='_blank' rel='noopener noreferrer'  href="https://nodejs.dev/a-brief-history-of-nodejs">completed ten years</a> of it&#39;s existence. The awesome community has been instrumental in the humongous success that node enjoys today. npm, today, is one of the biggest package repositories ever. But as a developer who cannot unsee the bugs that he himself introduced in the system, Ryan couldn&#39;t help but move on to a different endeavor. The above reasons motivated him to work on  <a target='_blank' rel='noopener noreferrer'  href="https://deno.land/">Deno</a>: A secure runtime for Javascript and Timescript .</p>
<h2 id="deno">Deno</h2>
<p>The name, Deno is actually derived as an anagram of Node. It is best described as per it&#39;s website:</p>
<blockquote>
<p>Deno is a simple, modern and secure runtime for JavaScript and TypeScript that uses V8 and is built in Rust.</p>
</blockquote>
<p>There are a lot of things to pay attention to in this simple description. Let&#39;s go over them one-by-one:</p>
<h3 id="security">Security</h3>
<p>Security is one of the biggest USPs of Deno. Deno aims to mimic the browser. And just like any browser, the JavaScript running in it does not have any access to the underlying file-system, etc., by default. Deno, in the same way, provides a secure sandbox for JavaScript to run in. By default, the JavaScript running within the runtime has no permissions. The user has to explicitly grant each and every individual permission which his app requires.</p>
<h3 id="module-system">Module system</h3>
<p>At the moment, there is no package.json in Deno, neither there is any intention to bring anything like that anytime sooner. Imports will always be via relative or absolute URLs only. At the time of this writing, Deno does not support any of the npm package. During the early stage of it&#39;s design, it was made clear that there are no plans to support Node modules due to the complexities involved. However, there have been  <a target='_blank' rel='noopener noreferrer'  href="https://github.com/denoland/deno/issues/2644">some discussions</a>  making rounds about the same, but it has not arrived at any conclusion yet.</p>
<h3 id="typescript-support">TypeScript Support</h3>
<p>Deno&#39;s standard modules are all written in TypeScript. The TypeScript compiler is directly compiled into Deno. Initially, this caused the startup time to be almost around ~1 minute. But this problem was quickly addressed, thanks to  <a target='_blank' rel='noopener noreferrer'  href="https://v8.dev/blog/custom-startup-snapshots">V8 snapshots</a>. This greatly brought down the startup times. This enabled TS compilers to start-up scripts very quickly. TypeScript is treated as a first class language. Users can directly import TypeScript code (with the .ts extension) immediately.</p>
<h3 id="rust">Rust</h3>
<p>In it&#39;s early days, Deno was prototyped in Go. Now, however, for various reasons, Deno has been converted in a solid Rust project. Unlike Node, Deno is not a huge monolith, but rather a collection of Rust  <a target='_blank' rel='noopener noreferrer'  href="https://crates.io/">crates</a>. This was done to facilitate opt-in functionality for users who may not desire to have the entire Deno executable packaged into one, but would rather be happy with only a collection of selective modules. This allows users to build their own executables.</p>
<h3 id="limitations">Limitations</h3>
<p>It should be noted that Deno is not a fork of Node. While Node is over a decade old, Deno has been in development only from the past two years. At the time of this writing,  <a target='_blank' rel='noopener noreferrer'  href="https://github.com/denoland/deno/releases">Deno v1.0.0 was released only a few days ago, on the 13th of May, 2020</a>. Deno may not be suitable for many use-cases today as it still has some limitations:</p>
<ul>
<li>at this moment, Deno is not compatible with Node (NPM) package managers</li>
<li>accessing native systems beyond that which is provided by Deno is difficult. Hence it has a very nascent plugins / extensions system at the moment</li>
<li>the TypeScript compiler may prove to be a bottleneck in some cases. Plans are in place to port TSC to Rust</li>
<li>the HTTP server performance is just at par with that of Node (25k requests served by Deno vs 34k requests served by Node for a hello-world application)</li>
</ul>
<h2 id="final-thoughts">Final Thoughts</h2>
<p>The history of JavaScript has been long and full of bumps. Today, it is one of the most trending and fastest growing languages. The community is as active as ever. Node.js, V8 and other projects have brought JavaScript to places it was never thought for. With Deno, another important chapter is being written in the history of JavaScript. As of now, according to me, Deno cannot be looked at as a replacement of Node. It can definitely be considered as an alternative to Node. But even for that, we may have to wait for some future releases of Deno for it to mature. Having said that, this is a great time to be alive as a JavaScript developer. With the ecosystem thriving, today a JavaScript developer can function at any vertical of the system, be it front-end, back-end, database, etc. With the release of Deno, we can easily bet on runtimes enabling JavaScript to be run on servers for many years that are yet to come.</p>
]]></content:encoded></item><item><title><![CDATA[So you want to be a Python expert?]]></title><description><![CDATA[With the global pandemic resulting in a world-wide quarantine, we're seeing a lot interest peeking in Python as language. Social-media, developer blogs and other platforms are flooding with blogs, videos, podcasts, etc. related to either getting star...]]></description><link>https://blog.pratikms.com/so-you-want-to-be-a-python-expert</link><guid isPermaLink="true">https://blog.pratikms.com/so-you-want-to-be-a-python-expert</guid><category><![CDATA[Python]]></category><category><![CDATA[Python 3]]></category><category><![CDATA[beginner]]></category><dc:creator><![CDATA[Pratik Shivaraikar]]></dc:creator><pubDate>Sun, 03 May 2020 15:31:18 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1595271019896/NN8EA7Fbl.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>With the global pandemic resulting in a world-wide quarantine, we&#39;re seeing a lot interest peeking in Python as language. Social-media, developer blogs and other platforms are flooding with blogs, videos, podcasts, etc. related to either getting started with Python or upskilling yourself from being a beginner to becoming an intermediate / expert Python developer. With such an amount of ever-growing interest in the Python community, I figured this to be a good time to touch on some key concepts by bringing to everyone&#39;s attention  <a target='_blank' rel='noopener noreferrer'  href="https://www.youtube.com/watch?v=cKPlPJyQrt4">one of the best PyData talks</a>  that I ever came across delivered by James Powell back in 2017. I believe it is important to revisit such masterpieces time and again to keep it alive and pass it on to all the curious cats out there. This blog post is account of taking a brief look at all the key takeaways from the talk.</p>
<h2 id="what-it-takes-to-be-good-at-python-">What it takes to be good at Python?</h2>
<p>Python originated as a scripting language. It&#39;s main purpose was such that developers would be able to write simple scripts in order to orchestrate some higher level languages such as C; or to patch together different constructs; and to quickly get things done. It has, however, since then, evolved into a full-fledged general-purpose programming language. Along it&#39;s way Python has grown to be opinionated to think about some core-concepts in it&#39;s own way in programming.</p>
<p>To be good at Python, one must have a good understanding of a couple of things that the language comes with, like: the built-in data types, built-in functions, a little-bit of understanding of what&#39;s available in the standard library, etc. Pretty basic stuff. However, to really become an &#39;expert&#39; at Python, once must understand what is the &#39;next step&#39; after this? What does it really take to be effective at Python rather than just being good at it?</p>
<h2 id="data-models">Data-models</h2>
<p>All the data in Python is represented as <em>Objects</em>. Multiple objects can have relationships between them. This is in conformance to  <a target='_blank' rel='noopener noreferrer'  href="https://en.wikipedia.org/wiki/Von_Neumann_architecture">Von Neumann&#39;s</a> model of a  <a target='_blank' rel='noopener noreferrer'  href="https://en.wikipedia.org/wiki/Stored-program_computer">&quot;stored-program computer&quot;</a>  in which code is also represented as objects. Python, at it&#39;s core, is unbelievably consistent. After working with Python for a while, it becomes naturally intuitive and you are able to start making well-informed guesses about features that are new to you.</p>
<blockquote>
<p>Guido’s sense of the aesthetics of language design is amazing. I’ve met many fine language designers who could build theoretically beautiful languages that no one would ever use, but Guido is one of those rare people who can build a language that is just slightly less theoretically beautiful but thereby is a joy to write programs in.</p>
<ul>
<li>Jim Hugunin, Creator of Jython, cocreator of AspectJ, architect of the .Net DLR</li>
</ul>
</blockquote>
<p>The idea of data models is that, by implementing the <em>dunder</em> / <em>special</em> / <em>magic</em> methods, our objects can behave like the built-in types thereby enabling expressive coding  style that the community considers Pythonic. Python and Ruby are the same in this regard. Both of these languages empower their users with a rich metaobject protocol a.k.a magic methods that enables users to leverage the same tools that are available to the core developers. This is interestingly in contrast to a language like JavaScript. Objects in JavaScript do have features that are magic, but you cannot leverage them in user-defined objects. For example, before JavaScript 1.8.5, having read-only attributes in a user-defined object was not possible. This was in contrast to some built-in objects which always had read-only attributes. It was not until ECMAScript 5.1 came out in 2009 when users started having the ability to define read-only attributes for their user-defined objects. Having said that, the metaobject protocol of JavaScript is evolving, but historically it has been more limited that that of Python or Ruby. To put it simply, data-models are nothing but an API for core language constructs.</p>
<h2 id="decorators">Decorators</h2>
<p>At implementation level, Python decorators do not resemble the classic  <a target='_blank' rel='noopener noreferrer'  href="https://en.wikipedia.org/wiki/Decorator_pattern">Decorator design pattern</a>, but an analogy can be made. The Decorator design pattern allows behavior to be added to individual objects, dynamically, without affecting the behavior of other objects from the same class. Quoting directly from <em>Design Patterns: Elements of Reusable Object-Oriented Software</em>: </p>
<blockquote>
<p>The decorator conforms to the interface of the component it decorates so that its presence is transparent to the component’s clients. The decorator forwards requests to the component and may perform additional actions (such as drawing a border) before or after forwarding. Transparency lets you nest decorators recursively, thereby allowing an unlimited number of added responsibilities.</p>
</blockquote>
<p>In Python, the decorator function plays the role of the concrete Decorator subclass, and the inner function it returns is the decorator instance. The returned function wraps the function to be decorated, which is analogous to the component in the design pattern. The returned function is transparent because it conforms to the interface of the component by accepting the same arguments. It forwards calls to the component and may perform additional operations either before or after it. This also provides the ability to recursively add nested decorators enabling additional responsibilities.</p>
<h2 id="generators">Generators</h2>
<p>Iteration is fundamental to data processing. And when scanning datasets that don’t fit in memory, we need a way to fetch the items lazily, that is, one at a time and on demand. This is what the Iterator pattern is about. Python does not have macros like Lisp, so abstracting away the Iterator pattern required changing the language: the <em>yield</em> keyword was added in Python 2.2 (2001). The yield keyword allows the construction of generators, which work as iterators.</p>
<p>Every generator is an iterator. Generators fully implement the iterator interface. But an iterator — as defined in the <em>Gang of Four</em> book — retrieves items from a collection, while a generator can produce items “out of thin air.” Python 3 uses generators in many places. Even the <code>range()</code> built-in now returns a generator-like object instead of a full-blown list like before.</p>
<p>Any Python function that has the yield keyword in its body is a generator function: a function which, when called, returns a generator object. In other words, a generator function is a generator factory. The only syntax distinguishing a plain function from a generator function is the fact that the latter has a yield keyword somewhere in its body. Some argued that a new keyword like gen should be used for generator functions instead of def , but Guido did not agree. His arguments are in  <a target='_blank' rel='noopener noreferrer'  href="https://www.python.org/dev/peps/pep-0255/">PEP 255</a> </p>
<p>A generator function builds a generator object that wraps the body of the function. When we invoke <code>next(...)</code> on the generator object, execution advances to the next <code>yield</code> in the function body, and the <code>next(...)</code> call evaluates to the value yielded when the function body is suspended. Finally, when the function body returns, the enclosing generator object raises <code>StopIteration</code> , in accordance with the <code>Iterator</code> protocol.</p>
<h2 id="context-managers">Context Managers</h2>
<p>Context manager objects exist to control a with statement, just like iterators exist to control a for statement. The <code>with</code> statement was designed to simplify the <code>try</code>/<code>finally</code> pattern, which guarantees that some operation is performed after a block of code, even if the block is aborted because of an exception, a return or <code>sys.exit()</code> call. The code in the finally clause usually releases a critical resource or restores some previous state that was temporarily changed. The context manager protocol consists of the <code>__enter__</code> and <code>__exit__</code> methods. At the start of the <code>with</code>, <code>__enter__</code> is invoked on the context manager object. The role of the <code>finally</code> clause is played by a call to <code>__exit__</code> on the context manager object at the end of the with block.</p>
<p>The <code>@contextmanager</code> decorator reduces the boilerplate of creating a context manager. Instead of writing a whole class with <code>__enter__/__exit__</code> methods, you just implement a generator with a single <code>yield</code> that should produce whatever you want the <code>__enter__</code> method to return.</p>
<h2 id="the-ultimate-example">The Ultimate Example</h2>
<p>The following example is straight-up picked from the PyData talk. James really uses this example well to connect all the dots in the end to show how to use all the above discussed concepts practically.</p>
<p>Let&#39;s say that we want to connect to a database, create a table, insert some entries, print those entries and finally, drop the table. The most basic way of doing it would be:</p>
<pre><code>from sqlite3 import connect

with connect('test.db') as conn:

    cur = conn.cursor()

    cur.execute('<span class="hljs-keyword">create</span> <span class="hljs-keyword">table</span> points(x <span class="hljs-built_in">int</span>, y <span class="hljs-built_in">int</span>)<span class="hljs-string">')

    cur.execute('</span><span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> points (x, y) <span class="hljs-keyword">values</span> (<span class="hljs-number">1</span>, <span class="hljs-number">2</span>)<span class="hljs-string">')
    cur.execute('</span><span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> points (x, y) <span class="hljs-keyword">values</span> (<span class="hljs-number">3</span>, <span class="hljs-number">4</span>)<span class="hljs-string">')
    cur.execute('</span><span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> points (x, y) <span class="hljs-keyword">values</span> (<span class="hljs-number">5</span>, <span class="hljs-number">6</span>)<span class="hljs-string">')

    for row in cur.execute('</span><span class="hljs-keyword">select</span> x, y <span class="hljs-keyword">from</span> points<span class="hljs-string">'):
        print(row)

    cur.execute('</span><span class="hljs-keyword">drop</span> <span class="hljs-keyword">table</span> points<span class="hljs-string">')</span>
</code></pre><p>For the sake of convenience, let&#39;s assume that SQLite does not support transactions, and that we need to implement the basic setup and tear-down action of the database, irrespective of any errors that we may encounter in between. We can do this by implementing a custom context manager:</p>
<pre><code><span class="hljs-keyword">from</span> sqlite3 <span class="hljs-keyword">import</span> connect

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">contextmanager</span>:</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self, cur)</span>:</span>
        self.cur = cur

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__enter__</span><span class="hljs-params">(self)</span>:</span>
        self.cur.execute(<span class="hljs-string">'create table points(x int, y int)'</span>)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__exit__</span><span class="hljs-params">(self, *args)</span>:</span>
        self.cur.execute(<span class="hljs-string">'drop table points'</span>)

<span class="hljs-keyword">with</span> connect(<span class="hljs-string">'test.db'</span>) <span class="hljs-keyword">as</span> conn:

    cur = conn.cursor()

    <span class="hljs-keyword">with</span> contextmanager(cur):

        cur.execute(<span class="hljs-string">'insert into points (x, y) values (1, 2)'</span>)
        cur.execute(<span class="hljs-string">'insert into points (x, y) values (3, 4)'</span>)
        cur.execute(<span class="hljs-string">'insert into points (x, y) values (5, 6)'</span>)

        <span class="hljs-keyword">for</span> row <span class="hljs-keyword">in</span> cur.execute(<span class="hljs-string">'select x, y from points'</span>):
            print(row)
</code></pre><p>Here, we made use of Python&#39;s data-models to implement a custom context manager which sets up our database and drops it by the end. One thing to notice is to understand how these <code>__enter__</code> and <code>__exit__</code> methods are being called. By adding something as simple as <code>print()</code> statements with our methods, we can see that the <code>__enter__</code> method is always called before the <code>__exit__</code> method. This suggests a specific chronology, a specific sequence. We can enforce this sequence with the help of generator functions:</p>
<pre><code><span class="hljs-keyword">from</span> sqlite3 <span class="hljs-keyword">import</span> connect

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">contextmanager</span>:</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self, cur)</span>:</span>
        self.cur = cur

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__enter__</span><span class="hljs-params">(self)</span>:</span>
        self.gen = temptable(self.cur)
        next(self.gen)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__exit__</span><span class="hljs-params">(self, *args)</span>:</span>
        next(self.gen, <span class="hljs-keyword">None</span>)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">temptable</span><span class="hljs-params">(cur)</span>:</span>
    cur.execute(<span class="hljs-string">'create table points(x int, y int)'</span>)
    <span class="hljs-keyword">yield</span>
    cur.execute(<span class="hljs-string">'drop table points'</span>)

<span class="hljs-keyword">with</span> connect(<span class="hljs-string">'test.db'</span>) <span class="hljs-keyword">as</span> conn:

    cur = conn.cursor()

    <span class="hljs-keyword">with</span> contextmanager(cur):

        cur.execute(<span class="hljs-string">'insert into points (x, y) values (1, 2)'</span>)
        cur.execute(<span class="hljs-string">'insert into points (x, y) values (3, 4)'</span>)
        cur.execute(<span class="hljs-string">'insert into points (x, y) values (5, 6)'</span>)

        <span class="hljs-keyword">for</span> row <span class="hljs-keyword">in</span> cur.execute(<span class="hljs-string">'select x, y from points'</span>):
            print(row)
</code></pre><p>The only potential problem with the above defined code-snippet is that it is not generic. We have hard-coded the generator function that is called from <code>__enter__</code>. We can make it generic by introducing another data-model method to make our program purely generic:</p>
<pre><code><span class="hljs-keyword">from</span> sqlite3 <span class="hljs-keyword">import</span> connect

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">contextmanager</span>:</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self, gen)</span>:</span>
        self.gen = gen

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__call__</span><span class="hljs-params">(self, *args, **kwargs)</span>:</span>
        self.args, self.kwargs = args, kwargs
        <span class="hljs-keyword">return</span> self

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__enter__</span><span class="hljs-params">(self)</span>:</span>
        self.gen_inst = self.gen(*self.args, **self.kwargs)
        next(self.gen_inst)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__exit__</span><span class="hljs-params">(self, *args)</span>:</span>
        next(self.gen_inst, <span class="hljs-keyword">None</span>)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">temptable</span><span class="hljs-params">(cur)</span>:</span>
    cur.execute(<span class="hljs-string">'create table points(x int, y int)'</span>)
    <span class="hljs-keyword">yield</span>
    cur.execute(<span class="hljs-string">'drop table points'</span>)

<span class="hljs-keyword">with</span> connect(<span class="hljs-string">'test.db'</span>) <span class="hljs-keyword">as</span> conn:

    cur = conn.cursor()

    <span class="hljs-keyword">with</span> contextmanager(temptable)(cur):

        cur.execute(<span class="hljs-string">'insert into points (x, y) values (1, 2)'</span>)
        cur.execute(<span class="hljs-string">'insert into points (x, y) values (3, 4)'</span>)
        cur.execute(<span class="hljs-string">'insert into points (x, y) values (5, 6)'</span>)

        <span class="hljs-keyword">for</span> row <span class="hljs-keyword">in</span> cur.execute(<span class="hljs-string">'select x, y from points'</span>):
            print(row)
</code></pre><p>This is much better. Except that our call of the custom context manager looks a bit cluttered. We can refactor it a bit, like:</p>
<pre><code><span class="hljs-keyword">from</span> sqlite3 <span class="hljs-keyword">import</span> connect

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">contextmanager</span>:</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self, gen)</span>:</span>
        self.gen = gen

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__call__</span><span class="hljs-params">(self, *args, **kwargs)</span>:</span>
        self.args, self.kwargs = args, kwargs
        <span class="hljs-keyword">return</span> self

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__enter__</span><span class="hljs-params">(self)</span>:</span>
        self.gen_inst = self.gen(*self.args, **self.kwargs)
        next(self.gen_inst)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__exit__</span><span class="hljs-params">(self, *args)</span>:</span>
        next(self.gen_inst, <span class="hljs-keyword">None</span>)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">temptable</span><span class="hljs-params">(cur)</span>:</span>
    cur.execute(<span class="hljs-string">'create table points(x int, y int)'</span>)
    <span class="hljs-keyword">yield</span>
    cur.execute(<span class="hljs-string">'drop table points'</span>)

tmptable = contextmanager(temptable)

<span class="hljs-keyword">with</span> connect(<span class="hljs-string">'test.db'</span>) <span class="hljs-keyword">as</span> conn:

    cur = conn.cursor()

    <span class="hljs-keyword">with</span> tmptable(cur):

        cur.execute(<span class="hljs-string">'insert into points (x, y) values (1, 2)'</span>)
        cur.execute(<span class="hljs-string">'insert into points (x, y) values (3, 4)'</span>)
        cur.execute(<span class="hljs-string">'insert into points (x, y) values (5, 6)'</span>)

        <span class="hljs-keyword">for</span> row <span class="hljs-keyword">in</span> cur.execute(<span class="hljs-string">'select x, y from points'</span>):
            print(row)
</code></pre><p>All that we did here, was to take the <code>temptable()</code> generator and wrap it around our context manager. Since we&#39;re talking about wrapping functions here, it should naturally and immediately remind us about decorators. We can have a nice little decorator around <code>temptable()</code>, like:</p>
<pre><code><span class="hljs-keyword">from</span> sqlite3 <span class="hljs-keyword">import</span> connect

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">contextmanager</span>:</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span><span class="hljs-params">(self, gen)</span>:</span>
        self.gen = gen

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__call__</span><span class="hljs-params">(self, *args, **kwargs)</span>:</span>
        self.args, self.kwargs = args, kwargs
        <span class="hljs-keyword">return</span> self

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__enter__</span><span class="hljs-params">(self)</span>:</span>
        self.gen_inst = self.gen(*self.args, **self.kwargs)
        next(self.gen_inst)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__exit__</span><span class="hljs-params">(self, *args)</span>:</span>
        next(self.gen_inst, <span class="hljs-keyword">None</span>)

<span class="hljs-meta">@contextmanager</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">temptable</span><span class="hljs-params">(cur)</span>:</span>
    cur.execute(<span class="hljs-string">'create table points(x int, y int)'</span>)
    <span class="hljs-keyword">yield</span>
    cur.execute(<span class="hljs-string">'drop table points'</span>)

<span class="hljs-keyword">with</span> connect(<span class="hljs-string">'test.db'</span>) <span class="hljs-keyword">as</span> conn:

    cur = conn.cursor()

    <span class="hljs-keyword">with</span> temptable(cur):

        cur.execute(<span class="hljs-string">'insert into points (x, y) values (1, 2)'</span>)
        cur.execute(<span class="hljs-string">'insert into points (x, y) values (3, 4)'</span>)
        cur.execute(<span class="hljs-string">'insert into points (x, y) values (5, 6)'</span>)

        <span class="hljs-keyword">for</span> row <span class="hljs-keyword">in</span> cur.execute(<span class="hljs-string">'select x, y from points'</span>):
            print(row)
</code></pre><p>There! Right here we have an example which leverages all of the core concepts of Python which are discussed in this blog post: data-models, generators, context managers and finally, decorators. Fundamentally, in this example, we have these four features implemented with a very clear conceptual meaning. A context manager is merely some piece of code that pairs set-up and tear-down actions. A generator is merely a particular form of syntax that allows us to enforce sequencing and interleaving. Finally, we take this generator object and wrap it dynamically around our context manager using decorators. And we do all of this, using data-models. We use all this core features, and implement them together to rite what can be more or less called as an expert-level Python code.</p>
<h2 id="final-thoughts">Final Thoughts</h2>
<p>What we must take away from this is that expert-level code in Python, is not a code that uses every single feature. In fact, it is not even a code that uses these n-number of features in Python. It&#39;s actually a code that has a certain clarity to where and when a feature must be used. It is code that doesn&#39;t waste time of the person who&#39;s writing it neither of the person who&#39;s reading it. It is code that doesn&#39;t have a lot of additional mechanism associated with it. It doesn&#39;t have people creating their own protocols. It doesn&#39;t have people creating their own frameworks because the language itself provides the core tools to all the developers. One mere has to understand what those core pieces are, what they mean, what they need and how to assemble them. Of course, the syntax, argument sequencing, sequence of data-model dispatch, etc. do matter, but they all come secondary to the actual understanding of the core concepts. All that matters to achieve expert-level in Python is to remember what the core features are. The syntax for various features are bound to change. Many implementations will get fix under bug-fixes and enhancements. But these are some of the core features of Python which have been there from the very beginning and will continue to last no matter what. The core meaning behind these features is what is important and that is something that will guide us in writing expert-level Python code.</p>
]]></content:encoded></item><item><title><![CDATA[A Modern Alternative To Traditional Web Stacks]]></title><description><![CDATA[Traditionally, when people initially started building for the web, they used to follow a particular pattern. There used to be a simple client-server architecture. The browser, acting as a client, used to request for stuff. The server, on the other ha...]]></description><link>https://blog.pratikms.com/a-modern-alternative-to-traditional-web-stacks</link><guid isPermaLink="true">https://blog.pratikms.com/a-modern-alternative-to-traditional-web-stacks</guid><category><![CDATA[JAMstack]]></category><category><![CDATA[deployment]]></category><category><![CDATA[JavaScript]]></category><category><![CDATA[APIs]]></category><category><![CDATA[Web Development]]></category><dc:creator><![CDATA[Pratik Shivaraikar]]></dc:creator><pubDate>Mon, 23 Mar 2020 15:30:53 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1595270589578/FWR3aIYDF.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Traditionally, when people initially started building for the web, they used to follow a particular pattern. There used to be a simple client-server architecture. The browser, acting as a client, used to request for stuff. The server, on the other hand, used to deliver stuff as a response to this request from the said client. This was quite straight-forward. People would create assets, put them somewhere on some server. Then when someone came along with a client and asked for those resources, the server would simply deliver it to them. Quite straight and simple. This model was one of the very early adopters of the  <a target='_blank' rel='noopener noreferrer'  href="https://en.wikipedia.org/wiki/KISS_principle">KISS principle</a>. Unfortunately this didn&#39;t last for long. We soon started running into quite some limitations. The resources uploaded on the server were quite static. This rendered the overall user experience quite static as well.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1584966471905/yowtaZgF6.png" alt="Jamstack (1).png"></p>
<p>This lead to the requirement of adding some dynamism at the server level. So now, instead of just responding to requests from clients, the servers were now able to &#39;do stuff&#39; as well. This logical component which was now added at the server level, enabled servers to execute some scripts based on every request received and generate a view in turn which was then returned as a response to the client. This meant that requests could now be served on the fly.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1584966705404/pjc-X1Ley.png" alt="do-stuff.png"></p>
<p>One of the unintended side effects of the above evolution meant that it became mandatory for servers to have enough capacities and enough horsepower to meet the ever-increasing demands of rapidly developing clients. It meant that the server be separated in different types of components, such as: load balancers, database servers, etc. Each of these components had individual responsibilities. The load balancer was trusted with routing the incoming load in the sense of traffic, to multiple highly available, highly redundant servers so that the request is served at the earliest. These highly redundant, highly available servers could then in turn query the database servers which held the data and helped in constructing the views.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1584968831783/G4S0uz_nX.png" alt="legacy-servers (1).png"></p>
<p>We then discovered that there are some assets that are mostly static, and that we do not need to generate them again and again for every request. This lead to the rise of Content Delivery Networks a.k.a. CDNs. These CDNs were strategically geo-located in such a place that those assets could be directly served from these CDNs without querying our traditional infrastructure for every request.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1584969334961/fPdJa52lL.png" alt="legacy-with-cdn (1).png"></p>
<p>All this infrastructure, over time, though very sophisticated, got a bit complicated. And not just that, but over time, our browsers got more capable. This means that we can use these browsers to great effect. Also, as a consequence of building so many sites for such a long time, we as an industry have matured our processes and we&#39;ve really got a lot better at the way that we build sites and the way that we deploy them. This maturing of processes paved way to better tooling. Better tooling enabled us to generate our code, manage our code and deploy our code in new and improved ways.</p>
<p>This stemmed the concept of having a collection of technologies which we can collectively use, called as &#39;stacks&#39;. In simple words, a stack is nothing but layers of technologies which help us in delivering our applications. These &#39;layers&#39; of technologies are nothing but a collection or a set of tools that help us ensure that our site is delivered as we desire.</p>
<p>Over time, we&#39;ve developed quite a number of stacks such as <a target='_blank' rel='noopener noreferrer'  href="https://en.wikipedia.org/wiki/LAMP_%28software_bundle%29">LAMP</a>, <a target='_blank' rel='noopener noreferrer'  href="https://lemp.io/">LEMP</a>, <a target='_blank' rel='noopener noreferrer'  href="https://en.wikipedia.org/wiki/MEAN_%28solution_stack%29">MEAN</a>, PERN, <a target='_blank' rel='noopener noreferrer'  href="https://en.wikipedia.org/wiki/LYME_%28software_bundle%29">LYME / LYCE</a>, etc. All these stacks use different sets of tools and different sets of technologies.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1584970768505/0JjvJflSS.png" alt="legacy-stacks.png"></p>
<p>In all these stacks, there are a few common components. Usually, there is an Operating System for building our application on. It could Linux, Windows, etc. There is another layer for data access and the services attributed with it. It can be MySQL, MongoDB, CouchDB, etc. In addition to this, there is a layer for pre-processing, which contains actual scripting to handle the logic and assemble the views. This layer can be PHP, Node.js, Erlang, etc. Finally, there is a layer to ensure HTTP routing and serving. It can be Apache, Nginx, etc.</p>
<h1 id="jamstack">Jamstack</h1>
<p>I believe <a target='_blank' rel='noopener noreferrer'  href="https://jamstack.org/">Jamstack</a> is best described as per it&#39;s website:</p>
<blockquote>
<p>Fast and secure sites and apps delivered by pre-rendering files and serving them directly from a CDN, removing the requirement to manage or run web servers.</p>
</blockquote>
<p>To put it simply,  Jamstack is nothing but an approach to delivering applications. The word, Jamstack, comes together because of Javascript, APIs and Markup.</p>
<p>Unlike the stacks that we saw above, Jamstack looks quite a bit different. The responsibility of data access is taken care of by APIs. The database is now kind of abstracted, and we do not need to deal with it directly. We call these APIs using Javascript which interacts with Markup. Both of these together, i.e. Javascript and Markup co-exist in the browser and constitute for our Runtime. They take care of the pre-processing responsibility of our traditional stack. The HTTP routing and serving responsibility is taken care of by static servers and CDNs.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1584972131789/QnCaOu0Bx.png" alt="Jamstack (4).png"></p>
<p>To sum up, Jamstack is about things that are pre-rendered. It is about leveraging the browser&#39;s capabilities. It is about delivering applications and sites without having to deal with a web server and it&#39;s various configurations.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1584972593970/CA6KdtVxA.png" alt="lamp-jamstack-head-on.png"></p>
<p>If we look at how this architecture would look now in the context of client and server, it would seem as if we are going back to where we started from: the simple client-server architecture. But there&#39;s more to it than what meets the eye. The return to this simplicity was actually started a long time ago when the legendary  <a target='_blank' rel='noopener noreferrer'  href="https://en.wikipedia.org/wiki/Aaron_Swartz">Aaron Swartz</a> wrote a blog post about the same titled <a target='_blank' rel='noopener noreferrer'  href="http://www.aaronsw.com/weblog/000404">Bake, Don&#39;t Fry</a> in as early as 2002. In his blog post, he hints at a simplistic architecture where we do not have to build our response for every request that we receive, but instead have our response pre-rendered ahead of time so that it&#39;s ready to be returned as-is every time that we receive a request. Hence, <em>baking</em> the response ahead of time as opposed to <em>frying</em> it on demand for every request.</p>
<h2 id="motives">Motives</h2>
<p>The primary advocate for pre-rendering is to lighten the load at servers by doing the work beforehand. This means having pre-generated assets which are ready-to-deploy. It eliminates all the complexities of all the moving parts which are involved in the client-server architecture that we&#39;ve grown to evolve. In case of our traditional architecture, during a deployment we need to make sure that all the components are updated accordingly, namely: the load balancers, the database servers, the web servers, the CDNs, etc. It requires an action to be taken for each of these components. The more number of components, the more overhead in terms of taking actions for every component involved in the deployment process. On the other hand, this is extremely simplified in case of Jamstack. Since all the assets are pre-generated, we can deploy the whole application or website directly at CDNs.</p>
<h2 id="security">Security</h2>
<p>One of the main advantages of Jamstack is it&#39;s security. It majorly reduces the surface area of the attack. The more components that we have to deal with, the more care we have to take of ensuring security of all those components in our infrastructure.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1584974324561/xR53LcCjc.png" alt="security-head-on (1).png"></p>
<h2 id="performance">Performance</h2>
<p>Different stacks deal with performance with different ways. It is a well known fact that traditional stacks add static layers in order to improve performance. Static layers are nothing but caching mechanisms introduced at various components of the stack. Every level of the stack attempts at reducing the dynamic response by predicting the most commonly used assets or resources and caching the same. This logical separation adds an overhead and complexity to manage deployment and to decide what needs to cached and what needs to be left dynamic.</p>
<p>This is in contrast to Jamstack. Every time we do an update on the CDN, we are effectively updating the entire cache. Since our whole deployment consists of pre-generated assets, we need not need to take a call on what needs to be cached and what needs to be dynamic.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1584975178573/FWNG2nlue.png" alt="caching.png"></p>
<h2 id="scaling">Scaling</h2>
<p>In case of traditional stacks, more often than not, more infrastructure is added in order to scale. Adding infrastructure occurs in the form of either horizontal or vertical scaling. Horizontal scaling means that we scale by adding more machines in our pool of existing resources. On the other hand, vertical scaling means that we scale by adding more power, in terms of CPU, RAM, etc. to an existing machine. The result of this is that more resources and more servers means more cost. Our deployments get more complex and hence increases the overhead of replicating everything across our scaled environment.</p>
<p>In case of Jamstack it&#39;s a different scenario. Since our assets are already pre-generated and cached by design and by default. There are no on-demand requests, hence no on-demand work needs to be done. And since everything is delivered from Content Delivery Networks (CDNs) our responses are quick and efficient. CDNs, from their very inception are designed to handle things at high load. This hugely benefits us in case of Jamstack.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1584975735522/tZ5QzWkfy.png" alt="scaling.png"></p>
<h1 id="conclusion">Conclusion</h1>
<p>If we pay close attention, we might realize that the architecture that we&#39;re trying to achieve here may appear like we&#39;re going back to where we started from:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1584966471905/yowtaZgF6.png" alt="Jamstack (1).png"></p>
<p>This can raise a few eyebrows and it is fair. The primary difference between both of them is that we&#39;re going back to that initial simplistic architecture <strong>after</strong> we&#39;ve evolved. Now, we&#39;re not just going back to as it was before, but we have a host of tools to our advantage which we&#39;ve been learning and building over the time. Things aren&#39;t static in the same way anymore. There are various enablers such as: static site generators which pre-generate our assets at build time rather than request time; tooling and automation of our deployment processes; browser capabilities; services and the API ecosystem as a whole.</p>
<p>Having said that, while Jamstack is <strong>great</strong> for quite a number of scenarios like building an informational website, an application providing case studies, blog posts, etc., it is, in no way, the holy grail. There still exists a vast number of scenarios such as sites serving dynamic content like e-commerce applications, in which building an application or a website using our traditional stack is still clearly a far better option. Jamstack is more suitable for flexible static sites, and for those who don&#39;t want the overhead of maintaining complex applications such as CMS. There are already quite a few use-cases where people are considering  <a target='_blank' rel='noopener noreferrer'  href="https://medium.com/stories-from-upstatement/is-a-jamstack-right-for-your-site-3108bcb186bf">migrating to Jamstack over traditional CMS like WordPress, etc.</a></p>
<p>At last, like every other technology, one should carefully consider which technology stack would best serve one&#39;s application and adapt accordingly. This is the reason why this post is not titled as &#39;A Modern <em>Replacement</em> To Traditional Web Stacks&#39; but rather &#39;A Modern <em>Alternative</em> To Traditional Web Stacks&#39;.</p>
]]></content:encoded></item><item><title><![CDATA[Securing Cloud Infrastructure with IDS]]></title><description><![CDATA[With recent security disasters, such as  NordVPN falling a victim to credit stuffing attacks affecting around 2000 users,   Mastercard's data leak affecting 90,000 users,  ixigo falling a victim to intrusions affecting as much as 17 million users, an...]]></description><link>https://blog.pratikms.com/securing-cloud-infrastructure-with-ids</link><guid isPermaLink="true">https://blog.pratikms.com/securing-cloud-infrastructure-with-ids</guid><category><![CDATA[Security]]></category><category><![CDATA[network]]></category><dc:creator><![CDATA[Pratik Shivaraikar]]></dc:creator><pubDate>Mon, 18 Nov 2019 13:46:14 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1595270855579/JIHm9pHpr.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>With recent security disasters, such as  <a target='_blank' rel='noopener noreferrer'  href="https://arstechnica.com/information-technology/2019/11/nordvpn-users-passwords-exposed-in-mass-credential-stuffing-attacks/">NordVPN falling a victim to credit stuffing attacks affecting around 2000 users</a>,   <a target='_blank' rel='noopener noreferrer'  href="https://www.spiegel.de/netzwelt/web/mastercard-datenleck-bei-bonusprogramm-a-1282697.html">Mastercard&#39;s data leak affecting 90,000 users</a>,  <a target='_blank' rel='noopener noreferrer'  href="https://techcrunch.com/2019/02/14/hacker-strikes-again/">ixigo falling a victim to intrusions affecting as much as 17 million users</a>, and  <a target='_blank' rel='noopener noreferrer'  href="https://haveibeenpwned.com/PwnedWebsites">many more similar cases</a>; data-security has never been more important.</p>
<p>It is a common pattern noticed that the companies have blamed it on data-center providers for not hardening security of the infrastructure provided by them. In their defense, the Cloud providers claim that their job is merely to provision hardware as per the customer&#39;s need, and securing the infrastructure is not their responsibility. While both stand corrected, it is neither only the cloud provider nor only the customer&#39;s individual responsibility to secure their environment. Security is a shared responsibility, and both the parties have to do their part.</p>
<h1 id="intrusion-detection-systems-ids-">Intrusion Detection Systems (IDS)</h1>
<p>Intrusion Detection System (IDS) is not something that is new to the security scene. The core function of an IDS, as the name suggests, is to identify and detect intrusion attempts, and alert of notify the concerned stakeholders regarding the same. It does this by analyzing and monitoring network traffic, system resources or files for signs that indicate attackers are using known threats to infiltrate or steal data from the network or the resources within. </p>
<p>One may ask why would we need an IDS if we already have a firewall? The reason for the need of an IDS stems from the fact that a traditional firewall only analyzes network and transport layer headers. These headers may include information like: source IP, destination IP, metadata regarding the protocol used (IPv4, IPv6, ICMP, etc.), source port, destination port and more similar information. With advances in technology and cyber security awareness, attackers today do not target simply the network or the transport layers. They prefer focusing their attention on exploiting vulnerabilities in the OS, applications, protocols, etc. The traditional firewalls go blind on such attacks. While the importance and necessity of a traditional firewall cannot be denied, we still need an IDS in place for protecting ourselves from such possible attacks.</p>
<p>Different variations of an IDS have been quietly existing and being implemented in various forms, such as:</p>
<h2 id="network-based-intrusion-detection-system-nids-">Network-based Intrusion Detection System (NIDS)</h2>
<p>The placement of a Network-based Intrusion Detection System (NIDS) in the data center&#39;s network is critical. Usually an NIDS is placed typically on the inside of the traditional firewall. This allows the NIDS to monitor all the network traffic from and to all the devices. It is able to do so by working in promiscuous mode. In promiscuous mode, the network device is able to intercept and read each network packet that arrives in its entirety. This is usually done using a Switch Port Analyzer (SPAN) configurations or similar switch features. A SPAN allows traffic sent or received on one interface to be copied to another monitoring interface. Using this technique, an NIDS analyzes traffic by comparing it against a database of known attacks, also known as signatures, or by detecting anomalies in traffic patterns. When identified, an NIDS event is generated, logged and reported to the stakeholder through a management system.</p>
<h3 id="advantages-of-an-nids">Advantages of an NIDS</h3>
<ul>
<li>Monitors network traffic</li>
<li>Does not impact network performance as all operations are carried out on the copy of the packet</li>
<li>No impact on network availability as it does not interfere with network traffic</li>
</ul>
<h3 id="limitations-of-an-nids-">Limitations of an NIDS:</h3>
<ul>
<li>Cannot analyze network traffic</li>
<li>Requires up-to-date signatures</li>
<li>Unable to block attacks</li>
</ul>
<h2 id="host-based-intrusion-detection-system-hids-">Host-based Intrusion Detection System (HIDS)</h2>
<p>As the name suggests, a Host-based Intrusion Detection System (HIDS) manages both: the state and behavior of a host as well as the network traffic passing through it. It is able to do so by running an agent on the host and forwarding the events to the stakeholders on a management system. A HIDS 
typically monitors running processes, the resources accessed by them, system-specific logs and unauthorized changes in the file system.</p>
<h3 id="advantages-of-a-hids">Advantages of a HIDS</h3>
<ul>
<li>Monitor activities on a host</li>
<li>Detect changes in file, system and application</li>
<li>Detects attacks that NIDS fails to identify, for example: changes done by accessing the system console</li>
</ul>
<h3 id="limitations-of-a-hids">Limitations of a HIDS</h3>
<ul>
<li>Agent needs to be deployed on host that needs to be monitored</li>
<li>Does not detect network level attacks</li>
<li>Host itself still remains vulnerable to attacks and failures</li>
</ul>
<h2 id="nids-or-hids-">NIDS or HIDS?</h2>
<p>As can be seen, since both: NIDS and HIDS have their own sets of advantages and limitations, neither NIDS alone nor HIDS alone are sufficient by themselves for prevention against intrusions. While NIDS can detect attacks over a network, HIDS detects those on a particular host on that network. While NIDS goes blind for encrypted traffic, HIDS can decrypt and analyze the traffic at the host. Therefore, a combination of both NIDS and HIDS should be considered while securing a Cloud Infrastructure against intrusion attempts.</p>
<h1 id="open-source-offerings">Open source offerings</h1>
<p>Open source offers a vast variety of solutions for both, NIDS as well as HIDS. Each solution caters for different needs and use cases. We will now compare some of the popular Open-source software (OSS) out there to see which one can be a good bet:</p>
<h2 id="nids-oss-">NIDS OSS:</h2>
<h3 id="snort">Snort</h3>
<p> <a target='_blank' rel='noopener noreferrer'  href="https://www.snort.org/">Snort </a> is a popular NIDS and has been in the market since 1998. It was developed by maintained by SourceFire which was later acquired by Cisco. The main advantage of using Snort is it&#39;s ability for real-time traffic analysis and packet logging on networks. It uses a set of rules to check for hostile packets in a network and then generate alerts to the respective stakeholders or the network administrators. Its engine combines the benefits of both: signatures as well as anomaly-based detection techniques.</p>
<h3 id="zeek">Zeek</h3>
<p> <a target='_blank' rel='noopener noreferrer'  href="https://www.zeek.org/">Zeek</a> , formerly known as Bro IDS monitors network traffic passively for suspicious activities and attacks. Just like Snort, Zeek is also a combination of both: signature and anomaly based detection. The traffic captured is converted into a series of packets by the analysis engine. It uses policy-based intrusion detection.</p>
<h3 id="suricata">Suricata</h3>
<p>Suricata is a signature-based NIDS developed by Open Information Security Foundation (OSIF). It uses existing rule sets to monitor network traffic and provide alerts on detecting suspicious events. Suricata offers high speed and efficiency in network traffic analysis due to its multi-threaded design.</p>
<h3 id="head-to-head-comparison">Head-to-head comparison</h3>
<table>
<thead>
<tr>
<td>Parameters</td><td>Snort</td><td>Zeek</td><td>Suricata</td></tr>
</thead>
<tbody>
<tr>
<td>Multi-thread</td><td>No</td><td>No</td><td>Yes</td></tr>
<tr>
<td>OS compatibility</td><td>Any</td><td>Unix-like system</td><td>Any</td></tr>
<tr>
<td>Installation / deployment</td><td>Installation also available from packages</td><td>Manual Installation</td><td>Manual Installation</td></tr>
<tr>
<td>GUI Support</td><td>A lot</td><td>Few</td><td>Few</td></tr>
</tbody>
</table>
<h2 id="hids-oss">HIDS OSS</h2>
<h3 id="ossec">OSSEC</h3>
<p>OSSEC stands for Open Source Security Event Correlator. It performs log analysis, file integrity checking, policy monitoring, rootkit detection, and active response, using both: signature and anomaly based detection. OSSEC employs a server-agent model - meaning a dedicated server provides aggregation and analysis for every host. </p>
<h3 id="samhain">Samhain</h3>
<p>Samhain is a HIDS with central management that helps check file integrity, monitor log files, rootkit detection, port monitoring and detect hidden processes. It provides centralized and encrypted monitoring capabilities over TCP / IP communications. While the Samhain community is good, it is relatively a bit difficult to install, especially when compared to other HIDS solutions.</p>
<h3 id="tripwire">Tripwire</h3>
<p>Tripwire is known for its great capabilities to ensure data integrity, it helps system administrators to detect alterations to system files and notifies them if they are corrupt or tampered files. The open-source solution of Tripwire supports most distributions by offering package installations. It does however come with a few limitations. For example, it does not notify in real-time and one has to rely on logs for the same.</p>
<h3 id="head-to-head-comparison">Head-to-head comparison</h3>
<table>
<thead>
<tr>
<td>Parameters</td><td>OSSEC</td><td>Samhain</td><td>Tripwire</td></tr>
</thead>
<tbody>
<tr>
<td>Ubuntu official repository</td><td>No</td><td>Yes</td><td>Yes</td></tr>
<tr>
<td>CentOS official repository</td><td>No</td><td>No</td><td>Yes</td></tr>
<tr>
<td>File</td><td>Yes</td><td>Yes</td><td>Yes</td></tr>
<tr>
<td>Network</td><td>Yes</td><td>No</td><td>No</td></tr>
<tr>
<td>Logs</td><td>Yes</td><td>Partial</td><td>Yes</td></tr>
</tbody>
</table>
<h1 id="implementation-of-ids-in-a-cloud-infrastructure">Implementation of IDS in a Cloud Infrastructure</h1>
<p>As can be found after careful comparison, Suricata and OSSEC can be concluded as relatively better offerings for NIDS and HIDS respectively. So our ultimate NHIDS offering that we can use for our Cloud Infrastructure would be integrating them with a Security Information and Event Management System (SIEM). A SIEM is a combination of Security Event Management (SEM) - which analyzes log and event data in real time to provide threat monitoring, event correlation and incident response - with Security Information Management (SIM) which collects, analyzes and reports on log data. One of the most comprehensive SIEM solution is the Elastic Stack or the ELK Stack. The Elastic stack consists of open-source products:  <a target='_blank' rel='noopener noreferrer'  href="https://www.elastic.co/products/elasticsearch">Elasticsearch</a>, <a target='_blank' rel='noopener noreferrer'  href="https://www.elastic.co/products/logstash">Logstash</a>, <a target='_blank' rel='noopener noreferrer'  href="https://www.elastic.co/products/kibana">Kibana</a> and the <a target='_blank' rel='noopener noreferrer'  href="https://www.elastic.co/products/beats">Beats</a> family of log shippers.</p>
<p>Before we proceed to integrate all the above solutions we have one more OSS which was not discussed above:</p>
<h2 id="wazuh">Wazuh</h2>
<p>Wazuh was originally a fork of OSSEC. As the official document indicates, it is built with more reliability and scalability. Wazuh uses anomaly as well as signature based detection methods to perform rootkit detection, log analysis, integrity checking, registry monitoring, and active response. In its latest version, Wazuh offers out-of-the-box integration of OSSEC, Suricata and the Elastic stack. It also provides file monitoring within Docker containers by focusing on the persistent volumes and bind mounts.</p>
<p>In an environment where thousands of nodes are involved, a deployment mechanism which will scale is often preferred. Here we will see how to deploy Wazuh using Ansible for CentOS / RHEL / Fedora platform. For installation on other platforms, please refer their excellent  <a target='_blank' rel='noopener noreferrer'  href="https://documentation.wazuh.com/3.10/installation-guide/index.html">documentation</a>.</p>
<h3 id="install-ansible">Install Ansible</h3>
<p>Install EPEL repository</p>
<pre><code>[wazuhmanager@centos ~]$ sudo yum -y install epel-release
</code></pre><p>Install Ansible</p>
<pre><code>[wazuhmanager@centos ~]$ sudo yum install ansible
</code></pre><p>Now, we will need to generate the SSH authentication key pair for the Ansible server using the ssh-keygen tool. SSH implements public key authentication using RSA or DSA. Version 1 of the SSH protocol only supports RSA, while version 2 supports both systems.</p>
<pre><code>[wazuhmanager@centos ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file <span class="hljs-keyword">in</span> which to save the key (<span class="hljs-regexp">/home/ansible/</span>.ssh/id_rsa):
Enter passphrase (empty <span class="hljs-keyword">for</span> <span class="hljs-literal">no</span> passphrase):
Enter same passphrase again:
Your identification has been saved <span class="hljs-keyword">in</span> <span class="hljs-regexp">/home/ansible/</span>.ssh/id_rsa.
Your public key has been saved <span class="hljs-keyword">in</span> <span class="hljs-regexp">/home/ansible/</span>.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:Z2nkI+fOVMa21NxP8YZaKpQWFqbm4cnAKXZezkuG/<span class="hljs-number">0</span>g ansible@ansible
The key<span class="hljs-string">'s randomart image is:
+---[RSA 2048]----+
|          o      |
|     . . o .     |
|    o = = +    . |
|   . + @ * = o oo|
|      o S % * = =|
|       + @ * = o.|
|        E + +   .|
|       . * .     |
|        . +      |
+----[SHA256]-----+

If you wish you can include a passphrase.</span>
</code></pre><h3 id="install-wazuh-manager">Install Wazuh Manager</h3>
<p>Obtain the necessary playbooks and roles for the installation of the Wazuh server components, Elastic Stack components and Wazuh agents cloning the repository in /etc/ansible/roles.</p>
<pre><code>[wazuhmanager@centos ~]$ cd <span class="hljs-regexp">/etc/ansible/roles/</span>
[wazuhmanager@centos <span class="hljs-regexp">/etc/ansible/roles]$ sudo git clone --branch 3.10.2_7.3.2 https:/</span>/github.com/wazuh/wazuh-ansible.git
[wazuhmanager@centos /etc/ansible/roles]$ ls
wazuh-ansible
</code></pre><p>Edit /etc/ansible/roles/wazuh-ansible/playbooks/wazuh-manager.yml as per your needs. It should look something like this:</p>
<pre><code>- hosts: <span class="hljs-number">192.168</span><span class="hljs-number">.0</span><span class="hljs-number">.180</span>
  roles:
    - role: <span class="hljs-regexp">/etc/</span>ansible/roles/wazuh-ansible/roles/wazuh/ansible-wazuh-manager
    - { role: <span class="hljs-regexp">/etc/</span>ansible/roles/wazuh-ansible/roles/wazuh/ansible-filebeat, filebeat_output_elasticsearch_hosts: <span class="hljs-string">'192.168.0.108:9200'</span> }
</code></pre><p>Once done, run the playbook:</p>
<pre><code>[wazuhmanager@centos /etc/ansible/roles/wazuh-ansible/playbooks]$ ansible-playbook wazuh-manager.yml -b -K
</code></pre><p>Here, we use</p>
<ul>
<li>-b option to indicate that we are going to become a super user</li>
<li>-K option to indicate Ansible to ask for the password</li>
</ul>
<h3 id="install-elastic-stack-server">Install Elastic Stack Server</h3>
<p>Edit /etc/ansible/wazuh-ansible/wazuh-elk.yml so that it looks like the following:</p>
<pre><code>[wazuhmanager<span class="hljs-meta">@centos</span> /etc/ansible/wazuh-ansible]$ cat wazuh-elk.yml
- hosts: <span class="hljs-number">192.168</span><span class="hljs-number">.0</span><span class="hljs-number">.108</span>
  roles:
      - { role: role: <span class="hljs-regexp">/etc/</span>ansible/roles/wazuh-ansible/roles/elastic-stack/ansible-elasticsearch, elasticsearch_network_host: <span class="hljs-string">'localhost'</span> }
      - { role: <span class="hljs-regexp">/etc/</span>ansible/roles/wazuh-ansible/roles/elastic-stack/ansible-kibana, elasticsearch_network_host: <span class="hljs-string">'localhost'</span> }
</code></pre><p>Just like before, we run the playbook</p>
<pre><code>[wazuhmanager@centos /etc/ansible/roles/wazuh-ansible/playbooks]$ ansible-playbook wazuh-elk.yml -b -K
</code></pre><h3 id="install-wazuh-agent">Install Wazuh Agent</h3>
<p>First make sure that the manager&#39;s SSH key is added into .ssh/authorized_keys of each of your server on which you want to install the agent</p>
<pre><code>[wazuhmanager@centos ~]$ cat .ssh/id_rsa.pub | ssh centos@192.<span class="hljs-number">168.0</span>.<span class="hljs-number">180</span> <span class="hljs-string">"cat &gt;&gt; .ssh/authorized_keys"</span>
centos@192.<span class="hljs-number">168.0</span>.<span class="hljs-number">180</span><span class="hljs-string">'s password:</span>
</code></pre><p>On the agent node that you wish to manage, open the SSH config</p>
<pre><code>[wazuhagent@centos ~]$ sudo vi /etc/ssh/sshd_config
</code></pre><p>Check the SSH config of your agent to make sure that the following options have been set to</p>
<ul>
<li>PubkeyAuthentication yes</li>
<li>AuthorizedKeysFile .ssh/authorized_keys</li>
</ul>
<p>Once done, restart the SSH service</p>
<pre><code>[wazuhagent@centos ~]$ systemctl restart sshd
</code></pre><p>Edit /etc/ansible/roles/wazuh-ansible/playbooqks/wazuh-agent.yml so that it looks similar to:</p>
<pre><code>- hosts: <span class="hljs-number">192.168</span><span class="hljs-number">.0</span><span class="hljs-number">.102</span>
  roles:
    - <span class="hljs-regexp">/etc/</span>ansible/roles/wazuh-ansible/roles/wazuh/ansible-wazuh-agent
  vars:
    wazuh_managers:
      - address: <span class="hljs-number">192.168</span><span class="hljs-number">.0</span><span class="hljs-number">.180</span>
        port: <span class="hljs-number">1514</span>
        protocol: udp
        api_port: <span class="hljs-number">55000</span>
        api_proto: <span class="hljs-string">'http'</span>
        api_user: ansible
    wazuh_agent_authd:
      enable: <span class="hljs-literal">true</span>
      port: <span class="hljs-number">1515</span>
      ssl_agent_ca: <span class="hljs-literal">null</span>
      ssl_auto_negotiate: <span class="hljs-string">'no'</span>
</code></pre><p>Run the playbook:</p>
<pre><code>[wazuhagent@centos /etc/ansible/roles/wazuh-ansible/playbooks]$ ansible-playbook wazuh-agent.yml -b -K
</code></pre><p>Once all this is done, you will now have a fully functioning IDS solution consisting of an HIDS, NIDS and an SIEM.</p>
<h1 id="conclusion">Conclusion</h1>
<p>IDS is not a holy grail. Simply having an IDS won&#39;t ensure security, but will be a relatively hardened security nonetheless. Apart from IDS, we can have an IPS, NGFW, UTM, etc. With careful planning and a plan for ongoing maintenance, you can build a secure network with these tools.</p>
]]></content:encoded></item><item><title><![CDATA[The Django Jargon]]></title><description><![CDATA[The internet contains tons of information out there about  Django: The Web framework for perfectionists with deadlines . Why do we need one more article to add to it? The motivation of writing this blog post was a good friend of mine with whom I was ...]]></description><link>https://blog.pratikms.com/the-django-jargon</link><guid isPermaLink="true">https://blog.pratikms.com/the-django-jargon</guid><category><![CDATA[Django]]></category><category><![CDATA[Python]]></category><dc:creator><![CDATA[Pratik Shivaraikar]]></dc:creator><pubDate>Sun, 10 Nov 2019 21:24:30 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1595270918394/6APnUFL-V.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The internet contains tons of information out there about  <a target='_blank' rel='noopener noreferrer'  href="https://www.djangoproject.com">Django: The Web framework for perfectionists with deadlines</a> . Why do we need one more article to add to it? The motivation of writing this blog post was a good friend of mine with whom I was about to start a project. We belong from different technological backgrounds. This was going to be his first time dealing with a Django application. And it was then when I realized that he was going though the same set of curious questions that I went through when I came across Django at my first time. This blog post is meant to serve as a reference point for myself as well as you guys out there to quickly be able to grasp some of the jargon that one would typically encounter while learning to develop a Django project.</p>
<h2 id="project-structure">Project Structure</h2>
<p>At the time of writing this article, Django had released 2.2.6 and the Django Rest Framework had version 3.10.3 out. Both of them follow the same directory structure</p>
<pre><code>Project/
├── Project
│   ├── <span class="hljs-strong">__init__</span>.py
│   ├── settings.py
│   ├── urls.py
│   └── wsgi.py
└── manage.py
</code></pre><p>Let&#39;s go through the basic boilerplate files created:</p>
<h2 id="-__init__-py-"><code>__init__.py</code></h2>
<p>According to the  <a target='_blank' rel='noopener noreferrer'  href="https://docs.python.org/3/reference/import.html#regular-packages">official documentation</a>, Python defines two types of packages,  <a target='_blank' rel='noopener noreferrer'  href="https://docs.python.org/3/reference/import.html#regular-packages">regular packages</a>  and  <a target='_blank' rel='noopener noreferrer'  href="https://docs.python.org/3/reference/import.html#namespace-packages">namespace packages</a>. Regular packages are traditional packages as they existed in Python 3.2 and earlier. A regular package is typically implemented as a directory containing a <code>__init__.py</code> file. When a regular package is imported, this <code>__init__.py</code> file is implicitly executed, and the objects it defines are bound to names in the package&#39;s namespace. The <code>__init__.py</code> file can contain the same Python code that any other module can contain, and Python will add any additional attributes to the module when it is imported.</p>
<h2 id="-settings-py-"><code>settings.py</code></h2>
<p>The <code>settings.py</code> is a central configuration for all Django projects. Although <code>settings.py</code> uses reasonable default values for practically all variables, when a Django application transitions into the real world, one must take into account a series of adjustments to efficiently run the Django application.</p>
<h2 id="-urls-py-"><code>urls.py</code></h2>
<p><code>urls.py</code> is the main entry point of a Django application. In most circumstances, Django uses the <code>django.urls.path</code> method to achieve url path matching mechanism. However, Django also offers <code>django.urls.re_path</code> method. The major difference between both these methods is that the <code>path</code> method is designed to perform matches against exact strings, whereas the <code>re_path</code> method is designed to perform matches against patterned strings based on regular expressions.</p>
<h2 id="-wsgi-py-"><code>wsgi.py</code></h2>
<p>WSGI is the Web Server Gateway Interface. It is a specification that describes how a web server communicates with web applications, and how the web applications can be chained together to process one request. WSGI is a Python standard described in detail in  <a target='_blank' rel='noopener noreferrer'  href="https://www.python.org/dev/peps/pep-3333">PEP-3333</a>. The <code>wsgi.py</code> file contains the WSGI configuration properties for the Django project. WSGI is the recommended approach to deploy a Django aplication to production.</p>
<h2 id="-manage-py-"><code>manage.py</code></h2>
<p>The <code>manage.py</code> or <code>django-admin</code> commands can be used interchangeably as both serve the purpose of being Django&#39;s command-line utility. The main difference is that <code>manage.py</code> is used to run project-specific tasks, while <code>django-admin</code> is usually preferred for system-wide Django tasks.</p>
<blockquote>
<p><strong>Note: </strong> I&#39;m planning to update this documentation over time. Stay tuned.</p>
</blockquote>
]]></content:encoded></item></channel></rss>