Phrase-Based Statistical Translation of Programming Languages (Onward! 2014 - Onward! Papers)

Who

Svetoslav Karaivanov, Veselin Raychev, Martin Vechev

Track

Onward! 2014 Onward! Papers

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 24 Oct 2014 10:30 - 10:52 at Salon A - Session the Fourth Chair(s): Emery D. Berger

Abstract

Phrase-based statistical machine translation approaches have been highly successful in translating between natural languages and are heavily used by commercial systems (e.g. Google Translate).

The main objective of this work is to investigate the applicability of these approaches for translating between programming languages. Towards that, we investigated several variants of the phrase-based translation approach: i) a direct application of the approach to programming languages, ii) a novel modification of the approach to incorporate the grammatical structure of the target programming language (so to avoid generating target programs which do not parse), and iii) combines ii) with custom rules added to improve the quality of the translation.

To experiment with the above systems, we investigated machine translation from C# to Java. For the training, which takes about 60 hours, we used a parallel corpus of 20,499 C#-to-Java method translations. We then evaluated each of the three systems above by translating 1,000 C# methods. Our experimental results indicate that with the most advanced system, about 60% of the translated methods compile (the top ranked) and out of a random sample of 50 correctly compiled methods, 68% (34 methods) were semantically equivalent to the reference solution.

Svetoslav Karaivanov

ETH Zurich

Veselin Raychev

ETH Zurich

Martin Vechev

ETH Zurich

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 24 Oct
Displayed time zone: Tijuana, Baja California change

10:30 - 12:00	Session the FourthOnward! Papers at Salon A Chair(s): Emery D. Berger University of Massachusetts, Amherst

10:30 22m Talk		Phrase-Based Statistical Translation of Programming Languages Onward! Papers Svetoslav Karaivanov ETH Zurich, Veselin Raychev ETH Zurich, Martin Vechev ETH Zurich
10:52 22m Talk		Interleaving of Modification and Use in Data-driven Tool Development Onward! Papers Marcel Taeumel Hasso Plattner Institute, Michael Perscheid Hasso Plattner Institute, Bastian Steinert Hasso Plattner Institute, Jens Lincke Hasso Plattner Institute, Robert Hirschfeld HPI
11:15 22m Talk		Unifying Textual and Visual: a Theoretical Account of the Visual Perception of Programming Languages Onward! Papers Stéphane Conversy University of Toulouse - ENAC
11:37 22m Talk		Variational Data Structures: Exploring Tradeoffs in Computing with Variability Onward! Papers Eric Walkingshaw University of Marburg, Christian Kästner Carnegie Mellon University, Martin Erwig Oregon State University, Sven Apel University of Passau, Eric Bodden Fraunhofer SIT and TU Darmstadt