Share via


how to convert captcha into text?

Question

Thursday, March 27, 2014 5:33 AM

hi to all... 

I am working on a C# Windows Form application ..... in which i want to load captchas into the application and then read the captchas text and store that as a strings...

i hv tried MODI Library... it is working well with simple text images but is unable to read captchas and extract text from them .... can MODI Library be used for this purpose???

if not then what else should i do? 

thanks in advance to all fellows ...

farooq.hnf

All replies (10)

Thursday, March 27, 2014 11:24 AM âś…Answered | 3 votes

i agree with your point .. but there are softwares that converts captchas back into text .... 

farooq.hnf

Yes. Those are there to crack the Captcha mechanism. So it would fall under this sticky post:

Contributors: How to avoid aiding the development of malicious code

The entire point of a Captcha is that your programm knows what the Picture spells. And you compare what the user enters to that known "good string". Bot's will be unable to do that and fail. Users will prevail.

Often this is archieved by mangling a random string during graphical output.
Re-Captcha is base on a Database full of known "non-machine-readable" texts (as scanned in while digitizing books). The user is given two images - one that is already known, one that is not yet known. Once enough people enter the same thing for one image, it is added to the list of know images.

Let's talk about MVVM: http://social.msdn.microsoft.com/Forums/en-US/wpf/thread/b1a8bf14-4acd-4d77-9df8-bdb95b02dbe2 Please mark post as helpfull and answers respectively.


Thursday, March 27, 2014 6:30 AM

Strings and chars in Net library are two bytes wide with a private property for each character which indicates the character is one or two bytes.  You have to use an encoding method to convert a byte array of characters to a string which properly sets the internal property indicating if a character is one or two bytes.  So normally you have to use code like below.  There are 4 types of encoding : Ascii (removes no printable character), UTF7 (truncates MSB), UTF8 (no changes), Unicode (packs characters into two bytes);

            string input = "The quick brown fox jumped over the lazy dog";
            byte[] array = Encoding.UTF8.GetBytes(input);
            string output = Encoding.UTF8.GetString(array);

jdweng


Thursday, March 27, 2014 8:15 AM

I think that the idea of CAPTCHA is to determine whether or not the user is human. Therefore a good image can be only read by people, not by OCR libraries.


Thursday, March 27, 2014 8:28 AM

i agree with your point .. but there are softwares that converts captchas back into text .... 

farooq.hnf


Thursday, March 27, 2014 11:07 AM

That's the whole idea of captcha, that a computer will not be able to read them...

 

Noam B.

Do not Forget to Vote as Answer/Helpful, please. It encourages us to help you...


Thursday, March 27, 2014 2:24 PM | 1 vote

Strings and chars in Net library are two bytes wide with a private property for each character which indicates the character is one or two bytes.  You have to use an encoding method to convert a byte array of characters to a string which properly sets the internal property indicating if a character is one or two bytes.  So normally you have to use code like below.  There are 4 types of encoding : Ascii (removes no printable character), UTF7 (truncates MSB), UTF8 (no changes), Unicode (packs characters into two bytes);

            string input = "The quick brown fox jumped over the lazy dog";
            byte[] array = Encoding.UTF8.GetBytes(input);
            string output = Encoding.UTF8.GetString(array);

jdweng

You do know why the OP is looking for this, right?  Automatically interpreting Captchas would nearly always be linked to criminal behavior (e.g., testing stolen credit cards to find 'good' ones that can be used to make more valuable purchases).

Convert between VB, C#, C++, & Java (http://www.tangiblesoftwaresolutions.com)
Instant C# - VB to C# Converter
Instant VB - C# to VB Converter


Thursday, March 27, 2014 3:05 PM

It wasn't clear if the captures were OCR or just text.  I posted a simple solution as if the input was text.

jdweng


Saturday, April 5, 2014 12:41 PM

sorry for the delay fellows

yes the some of the softwares do know what the picture means in advance..... but at the same time there are softwares that do not know what is written in the captcha ( in advance ) and they read the captcha and extract text from that and then store the result in form of string (dont know how and this is what i want to know) ?

farooq.hnf


Saturday, April 5, 2014 12:44 PM

well Joel i want to extract text from captchas by reading them ..... and them store the text in the form of string... 

my program wouldnt know in advance what is written in the captcha ......

farooq.hnf


Saturday, April 5, 2014 3:18 PM

Hello,

These kind of topics should not be discussed here or on these forums, therefore I'm locking this post.

Regards, Eyal Shilony