Skip to main content

Computer Vision: Integrating Computer Vision to your web (or development) projects

Author by Lwin Maung

So, what can a computer see? Do android dream of electric sheep?
In general, computer will see what you want it to see. A "vision" for computer can come in multiple formats. At the end of the day however, a vision or what a computer determins "vision" is based on an indiviual frame at a given time. Come of think of it, humans processes vision in a very similar way as well. Who says that is not true? Yes. As humans, we have a constant stream of input from our eyes, however, a vision or what we percieve as vision is processed in a singular frame at a time -- thus, pictures, painting and moving or streaming picture -- which we describe as movies, motion pictures, or videos. At the end of the day, a computer can see a single frame (image) or a stream of video (live or replay) thru cameras or other means and process the data just like us.
At this point in time, I am going to stay basic for now. As much as I would love to jump into morals, ethics, and eventually laws of robotics and cybernatics, let's crawl before we can walk (and eventually run). At the current point in time (April 30, 2019), a developer such as myself can utilize what is known as Cognitive Services and enable a computer to see. What does that even mean? Using a metric ton of images, microsoft has created algrothums, which can describe contents within an image. That was the first stage in computer vision from Microsoft. Tell me what is in a given Image.
In the first few iterations of computer vision, the above image would results in certain objects being recognized.
  • Couch
  • Table
  • Table 2
  • Painting
  • Tree
  • Pillow
  • Green
  • Yellow
  • Gray
As additional training is given to the Ai model, the system learns more and more from trained images to come back with result set such as the one below:
Objects recognized:
[ { "rectangle": { "x": 602, "y": 97, "w": 113, "h": 208 }, "object": "potted plant", "parent": { "object": "plant", "confidence": 0.837 }, "confidence": 0.673 }, { "rectangle": { "x": 170, "y": 184, "w": 435, "h": 139 }, "object": "couch", "parent": { "object": "seating", "confidence": 0.626 }, "confidence": 0.618 } ]
Other Items:
[ { "name": "couch", "confidence": 0.999709964 }, { "name": "sofa bed", "confidence": 0.995230138 }, { "name": "studio couch", "confidence": 0.9937122 }, { "name": "living", "confidence": 0.987508357 }, { "name": "indoor", "confidence": 0.984029 }, { "name": "floor", "confidence": 0.9806024 }, { "name": "coffee table", "confidence": 0.967637658 }, { "name": "pillow", "confidence": 0.9657884 }, { "name": "chair", "confidence": 0.965541065 }, { "name": "wall", "confidence": 0.96527797 }, { "name": "room", "confidence": 0.963956654 }, { "name": "loveseat", "confidence": 0.963927269 }, { "name": "green", "confidence": 0.9252639 }, { "name": "club chair", "confidence": 0.885002255 }, { "name": "bed", "confidence": 0.8836875 }, { "name": "armrest", "confidence": 0.85374856 }, { "name": "table", "confidence": 0.824915349 }, { "name": "design", "confidence": 0.8055203 }, { "name": "living room", "confidence": 0.8014669 }, { "name": "vase", "confidence": 0.7928853 }, { "name": "futon pad", "confidence": 0.7891575 }, { "name": "sleeper chair", "confidence": 0.7838532 }, { "name": "sofa", "confidence": 0.726579249 }, { "name": "outdoor sofa", "confidence": 0.7121508 }, { "name": "white", "confidence": 0.703035951 }, { "name": "cushion", "confidence": 0.6869614 }, { "name": "throw pillow", "confidence": 0.622486055 }, { "name": "seat", "confidence": 0.6210583 }, { "name": "furniture", "confidence": 0.594202042 }, { "name": "comfort", "confidence": 0.551928043 } ]
Describe the image:
{ "tags": [ "living", "indoor", "room", "green", "table", "sofa", "white", "furniture", "chair", "sitting", "coffee", "area", "large", "gray", "modern", "wooden", "television", "yellow" ], "captions": [ { "text": "a green sofa in a living room", "confidence": 0.909238338 } ] }
Additional Information
[ { "name": "abstract_", "score": 0.00390625 }, { "name": "indoor_room", "score": 0.87890625 } ]
Analyzing the result
From the analysis, we can tell the computer is not 100% there yet. It has recognized items wrong in a number of places. It thinks that there might be a "bed" in the picture and level of confidence is around 88%. Which is clearly wrong. However, in grand scheme of things, it weighs all the items it 'recognized' in the picture and came back with the description of "a green sofa in a living room" with confidence of 90%. Yes, the image shows a sofa, it has green pilows, but all in all, fairly decent in terms of figuring out what that means. With additional training, it will only get better.
So, how can I use such a technology in my project? Am I limited? The answer is a resounding Yes and a No. Yes. You are limited. To certain things -- Microsoft, Azure, and what they have trained so far (more on custom images). No, you can use it in your mobile app, web app, IoT app. Sky is the limit here. So how can I use it?
You will need to create an Azure account with cognitive services and and point to the API, send in an image, and wait for the results. Yes. It is that easy!
Can I use it on the Web?
Sure: The follow JavaScript code block will enable you to describe the image:
function processImage() {
        // **********************************************
        // *** Update or verify the following values. ***
        // **********************************************
        // Replace <Subscription Key> with your valid subscription key.
        var subscriptionKey = "<Subscription Key>";
        // You must use the same Azure region in your REST API method as you used to
        // get your subscription keys. For example, if you got your subscription keys
        // from the West US region, replace "westcentralus" in the URL
        // below with "westus".
        // Free trial subscription keys are generated in the "westus" region.
        // If you use a free trial subscription key, you shouldn't need to change
        // this region.
        var uriBase =
        // Request parameters.
        var params = {
            "visualFeatures": "Categories,Description,Color",
            "details": "",
            "language": "en",
        // Display the image.
        var sourceImageUrl = document.getElementById("inputImage").value;
        document.querySelector("#sourceImage").src = sourceImageUrl;
        // Make the REST API call.
            url: uriBase + "?" + $.param(params),
            // Request headers.
            beforeSend: function(xhrObj){
                    "Ocp-Apim-Subscription-Key", subscriptionKey);
            type: "POST",
            // Request body.
            data: '{"url": ' + '"' + sourceImageUrl + '"}',
        .done(function(data) {
            // Show formatted JSON on webpage.
            $("#responseTextArea").val(JSON.stringify(data, null, 2));
        .fail(function(jqXHR, textStatus, errorThrown) {
            // Display error message.
            var errorString = (errorThrown === "") ? "Error. " :
                errorThrown + " (" + jqXHR.status + "): ";
            errorString += (jqXHR.responseText === "") ? "" :
How about if I want to use C# and combine it in my IoT App that is written for UWP?
Follow code segment will do the trick for you:

class Program
        // subscriptionKey = "0123456789abcdef0123456789ABCDEF"
        private const string subscriptionKey = "<SubscriptionKey>";
        // localImagePath = @"C:\Documents\LocalImage.jpg"
        private const string localImagePath = @"<LocalImage>";
        private const string remoteImageUrl =
        // Specify the features to return
        private static readonly List<VisualFeatureTypes> features =
            new List<VisualFeatureTypes>()
            VisualFeatureTypes.Categories, VisualFeatureTypes.Description,
            VisualFeatureTypes.Faces, VisualFeatureTypes.ImageType,
        static void Main(string[] args)
            ComputerVisionClient computerVision = new ComputerVisionClient(
                new ApiKeyServiceClientCredentials(subscriptionKey),
                new System.Net.Http.DelegatingHandler[] { });
            // You must use the same region as you used to get your subscription
            // keys. For example, if you got your subscription keys from westus,
            // replace "westcentralus" with "westus".
            // Free trial subscription keys are generated in the "westus"
            // region. If you use a free trial subscription key, you shouldn't
            // need to change the region.
            // Specify the Azure region
            computerVision.Endpoint = "";
            Console.WriteLine("Images being analyzed ...");
            var t1 = AnalyzeRemoteAsync(computerVision, remoteImageUrl);
            var t2 = AnalyzeLocalAsync(computerVision, localImagePath);
            Task.WhenAll(t1, t2).Wait(5000);
            Console.WriteLine("Press ENTER to exit");
        // Analyze a remote image
        private static async Task AnalyzeRemoteAsync(
            ComputerVisionClient computerVision, string imageUrl)
            if (!Uri.IsWellFormedUriString(imageUrl, UriKind.Absolute))
                    "\nInvalid remoteImageUrl:\n{0} \n", imageUrl);
            ImageAnalysis analysis =
                await computerVision.AnalyzeImageAsync(imageUrl, features);
            DisplayResults(analysis, imageUrl);
        // Analyze a local image
        private static async Task AnalyzeLocalAsync(
            ComputerVisionClient computerVision, string imagePath)
            if (!File.Exists(imagePath))
                    "\nUnable to open or read localImagePath:\n{0} \n", imagePath);
            using (Stream imageStream = File.OpenRead(imagePath))
                ImageAnalysis analysis = await computerVision.AnalyzeImageInStreamAsync(
                    imageStream, features);
                DisplayResults(analysis, imagePath);
        // Display the most relevant caption for the image
        private static void DisplayResults(ImageAnalysis analysis, string imageUri)
            if (analysis.Description.Captions.Count != 0)
                Console.WriteLine(analysis.Description.Captions[0].Text + "\n");
                Console.WriteLine("No description generated.");
So what's next?
You can learn more about computer vision by visiting Microsoft's Cognitive Services page here or by talking to us. We will make your next app for you.