Uncover Additional Value of $1 Billion with Advanced Analytics & Machine Learning

Edward ChristianData Science LeadNovember 24 2019

Growing Popularity of Machine Learning in Oil & Gas The wave of industry revolution 4.0, in particular data analytics and machine learning applications, has been increasingly popular outside manufacturing industry. Oil and gas industry is one of the industrial sectors that currently seeing more applications of these breakthrough technologies in many large companies.

Utilization of (operational) data that has been collected throughout many years with the power of machine learning programs will make it possible for oil and gas companies to find additional $1 billion in value by increasing production, streamlining the supply chain, or reducing engineering time1. Applicable use cases of data analytics and machine learning in oil and gas industry are:

  1. Predictive maintenance of production assets,
  2. Predicting reservoir porosity and permeability,
  3. Lithology identification using clustering algorithm,
  4. Reservoir identification using seismic data,
  5. Data-driven virtual flow meter.

These are only a few samples of use case that are already being used in the industry and provide notable impact on how oil and gas companies can improve their operational efficiency and get real business results.

Sample Use Case: Virtual Flow Meter

What is virtual flow meter?

VFM is the cheapest solution for estimating the oil and gas production rate, if compared to a physical flow meter. It mainly uses the already available data such as pressure and temperature of bottomhole and wellhead choke to estimate the oil/gas flow rates. Since no additional hardware installation and maintenance are required, and no risk of production loss (if compared to manual well testing), this approach will soon become a good alternative solution. The two types of virtual metering are the physical model approach and the data driven approach.

Data driven approach

This method focuses on finding relationships between the system’s input and output data using minimum understanding of the working physical properties. It makes no assumption on the governing system, and removes any numerical challenges when no closed-form solution is available. Since data is the main source of truth for this approach, data validation process is needed to avoid biases due to incorrect or poor data quality.

Solution development and deployment

This stage consists of three main processes: data wrangling, model-fitting, and solution deployment. During data wrangling step, data are validated and enriched using data cleaning technique, missing-value imputation, and feature engineering. Feature engineering derives new attributes from raw time-series data in time domain (ratio, difference, rolling-metric, non-linear transformation, scaling, etc.) and in frequency domain (spectral analysis, etc.). For model-fitting, several ML algorithms can be applied to our regression problem, e.g. Decision Tree, Random Forest, Support Vector Machine, Neural Network, etc. Cross-validation (holdout, k-fold, leave-one-out) should be applied during model assessment and selection, to check how well a model generalizes to new data. At the last stage, the solution is deployed by automating both the data and model workflows, and visualizing the estimated flow rates and model performance over time.

References: https://www.mckinsey.com/industries/oil-and-gas/our-insights/a-billion-dollar-digital-opportunity-for-oil-companies Breiman, L. (2001). Statistical Modeling: The Two Cultures. Statistical Science Vol. 16, No. 3, 199-231.
Take the next step
Get a Demo/Diagnostic for your Organization's Digital Transformation Journey